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SUMMARY 


This  is  a  final  report  on  the  research  project  supported  by  the  Naval  Research  Laboratory 
under  Grant  N00014-87-K-2018,  entitled  Development  of  Improved  Modeling  and  Analysis 
Techniques  for  Dynamics  of  Shell  Structures,  which  covered  the  period  of  09  June  1987 
to  08  June  1990.  The  objectives  of  the  research  have  been:  1)  to  develop  modeling  and 
computational  techniques  suitable  for  the  dynamic  analysis  of  naval  shell  structures,  and 
2)  to  investigate,  implement  and  evaluate  tools  for  concurrent  processing  of  very  large 
structural  engineering  problems  on  the  Connection  Machine. 


I.  Research  Accomplishments 
Task  1:  Shell  Structural  Modeling  Techniques 

This  task  consists  of  two  related  efforts:  1)  improvement  of  the  ANS  shell  elements  (Refer¬ 
ences  1  through  4)  to  better  capture  the  coupling  effects  of  mambrane-bending,  membrane- 
transverse  shear,  and  bending-transverse  shear  phenomena;  2)  modeling  techniques  for 
improving  the  modes  and  mode  shapes  of  ‘dry’  shell  structures,  particularly  for  the  inter¬ 
mediate  frequency  ranges. 

As  a  results  of  the  first  effort,  Aashell  element  software  module  was  implemented  and 
delivered  to  NRL  and  its  theoretical  aspects  was  documented  in  References  13  and  14. 
Specifically,  the  new  version  of  the  ANS  shell  elements  pass  the  patch  test  and  consideiably 
streamlined,  resulting  in  substantial  computational  efficiency. 

Regarding  the  modeling  of  shell  structures  by  the  finite  elements  for  accurate  intermediate- 
frequency  computations,  our  initial  effort  began  with  tailoring  of  the  mass  matrices  as 
documented  in  Reference  7.  Even  though  such  mass-matrix  tailoring  gave  rise  to  a  signifi¬ 
cant  improvement  of  low-frequency  computations,  it  fell  short  of  yielding  any  appreciable 
improvement  on  intermediate  to  high-frequency  computations.  This  has  led  us  to  tailor 
not  only  the  mass  matrices  but  also  a  componcnt-by-compoiicnt  tailoring  (T stiffness  matri¬ 
ces.  For  example,  the  tailored  stiffness  matrix  consists  of  the  tailored  nretnbranc,  tailored 
bending  and  tailored  transverse  shear  stiffness  matrices.  The  syrrthesis  to  realize  such  a 
tailoring  was  facilitated  by  the  use  of  the  symbolic  analysis  technique  dc\ eloped  previously 
in  References  8  through  11. 
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The  so-called  frequency-window  tailoring  of  finite  element  modelb  as  uMplied  to  burs  and 
beams  (Reference  12)  demonstrates  that  it  can  accurately  obtain  very  high-frequency  com¬ 
ponents  with  relatively  coarse  finite  element  grids,  about  a  factor  of  five  to  ten  times  larger 
element  size  that  has  been  possible  in  conventional  finite  element  modeling.  This  improve¬ 
ment,  if  proved  to  be  the  case  for  general  shells,  can  have  a  significant  impact  on  the  finite 
element  modeling  capability  of  structural  acoustics  problems  in  the  future. 

Task  2:  Parallel  Computations  on  the  Connection  Machine 

This  task  hcis  focused  on  the  use  of  the  Connection  Machine  as  applied  to  the  explicit  tran¬ 
sient  analysis  of  ‘dry’  shell  structures.  Our  experience  has  been  documented  in  References 
5  and  6.  In  addition,  a  framebuffer  generated  visualization  of  the  transient  analysis  of  a 
generic  submarine  structure  was  produced  as  a  video  tape  and  delivered  to  the  NRL  tech¬ 
nical  monitor.  Specifically,  our  effort  concentrated  on  the  development  of  several  modules 
using  the  C*  language  provided  by  the  Thinking  Machine  Corporation.  The  modules  de¬ 
veloped  so  far  include:  decomposer  which  takes  as  input  an  arbitrary  mesh  description,  and 
produces  a  set  of  finite  element  data  structures  that  can  be  loaded  within  one  generic  CkI2 
chip;  mapper  that  assigns  each  of  the  data  structures  produced  by  the  decomposer  to  a  well 
defined  chip;  residual  evaluator  that  controls  the  direct  calculation  of  element  residuals, 
where  “direct”  means  that  no  element  stiffness  matrices  are  evaluated;  and  element  library 
that  includes  a  3D  2-node  truss,  a  3D  2-node  Bernouilli  beam,  a  3D  2-node  Timoshenko 
beam,  a  3D  8-node  brick,  a  2D  4-node  quadrilateral  and  a  4-node  ANS  shell  element.  These 
modules,  when  interfaced  with  visualization  kernel  that  was  developed  under  AFOSR  and 
NSF  grants  greatly  facilitates  the  understanding  of  the  computed  results. 

Our  experience  so  far  indicates  that  this  highly  parallel  processor  can  outperform  vector 
supercomputers  such  as  the  CRAY  family  on  explicit  computations  but  not  on  implicit 
ones.  Based  on  the  observations  obtained  during  the  present  study,  the  following  is  a 
summary  of  the  key  conclusions: 

(1)  The  current  CM2  processor  memory  size  of  64  Kbits  penalizes  high  order  elements 
in  the  sense  that  only  small  VP  (virtual  processor)  ratios  can  be  achieved.  Thus  the 
current  configuration  favors  simpler  elements.  (This  restriction  should  disappear  in 
future  CM2  models  which  will  have  iMbit  of  memory  per  processor.) 

(2)  Mesh  irregularities  slow  down  the  computation  speed  in  various  ways. 

(3)  The  Data  Vault  is  very  effective  at  reducing  I/O  time. 

(4)  The  Virtual  Processor  concept  outperforms  substructuring. 
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ABSTRACT 

A  frequency-window  tailoring  technique  is  proposed  for  improved  finite  clement  modeling 
of  structures  for  frequencies  and  their  mode  shapes  in  the  acoustic  range.  The  technique 
is  based  on  the  tmloring  of  three  element  attributes:  frequenc>  -tailored  mass  matrix,  en¬ 
hancement  of  stiffness  matrix  by  a  weighted  spectral  decomposition  of  membrane,  bending 
and  transverse  shear  energy  for  a  desired  frequency  range  (window),  and  a  discrete  Fourier 
synthesis  of  the  resulting  elemental  eigenproblem  models.  The  proposed  technique  has  been 
applied  to  the  vibration  problems  of  bars  and  beams,  which  illustrate  the  cfFectiveness  of 
the  technique  over  conventional  finite  element  modeling  techniques. 
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1.0  Introduction 

The  response  accuracy  of  finite  element  methods  applied  to  linear  structural  dynamics 
problems  is  a  fuiiction  of  both  the  finite  element  spatial  discretization  and  the  time  domain 
integration  technique  applied  to  the  coupled  ordinary  differential  equations  of  the  discrete 
model.  Traditional  techniques  in  improving  the  spatial  discretization  obtained  via  the 
finite  element  method  are  dominated  by  the  so-called  h-refinements  and  p- refinements. 

In  the  first  approach,  the  element  mathematical  formulation  is  held  fixed  while  the  number 
of  elements  (eind  number  of  global  variables)  is  increased  to  obtain  the  desired  spatial 
accuracy.  The  2>-refinement,  in  contrast,  holds  the  number  of  elements  constant  while 
increasing  the  order  of  the  displacement  field  interpolations  within  the  element.  This 
also  leads  to  an  increase  in  the  number  of  \^iables,  but  alters  the  fundamental  element 
behavior.  For  example,  one  might  refine  a  simple  one-element  truss  model  (i.e.  spring 
element)  by  introducing  a  mid-point  node.  With  a  h-refinement.  we  would  change  the 
model  from,  one  linear-displacement  element  to  two  linear  elements  which  share  the  mid¬ 
point  node.  A  p- refinement,  on  the  other  hand,  would  exploit  the  additional  node  to  replace 
the  linear  element  with  a  single  three-node  bar  element  employing  quadratic  interpolations 
of  the  internal  displacement  field. 

A  common  limiting  factor  for  both  of  these  refinements  is  that,  once  the  grid  sizes  and  ap¬ 
proximate  interpolation  functions  are  decided  upon,  the  accuracy  that  can  accrue  from  the 
resulting  discrete  model  is  fixed.  Specifically,  while  the  convergence  of  the  low  frequencies 
and  their  mode  shapes  is  in  general  assured  as  the  grids  and/or  interpolation  order  are 
increased,  there  has  been  lack  of  a  systematic  convergence  measurem.cnt  for  frequencies 
and  their  mode  shapes  ranging  from  intermediate  to  acoustics  components.  Consequently, 
this  lack  of  high-frequency  convergence  assessment  has  led  tj  the  belief  that  it  is  hopeless 
to  capture  with  high  accuracy  an  acoustic  range  of  frequencies  and  their  mode  shapes  by 
the  finite  element  approach  within  feasible  computational  means. 

The  development  of  consistent  mass  discretization  [1],  however,  has  motivated  a  number 
of  investigators  to  study  the  wave  dispersion  characteristics  of  various  mass  modeling 
procedures  for  finite  element  analysis  (2-11).  These  efforts  hhave  included  cissessments  of 
mass  lumping  for  both  constant-strain  and  higher- order  elements,  and  point  out  clearly  how 
mass  modeling,  independent  of  mesh  size  and  displacement  interpolation,  can  significantly 
affect  model  accuracy.  Park  and  Jensen  (12)  use  wave  dispersion  analysis  to  provide  a 
systematic  relatiionship  between  lumped  and  consistent  mass  discretizations,  and  show 
how  averaging  or  tailoring  the  mass  lumping  can  significantly  improve  resp  onse  accuracy 
at  specific  higher  frequencies.  This  approacli  is  not  adequate,  however,  for  obtaining 
highly  accurate  acoustic  frequency  ranges,  and  furthermore  leads  to  non  ili.ngonal  mas.s 
matrices,  which  are  undesirable  for  simulation  via  explicit  time  integration  mcthod.s  on 
massively-parallel  computers.  Thus,  there  remains  a  need  for  finite  element  approximations 
which  accurately  capture  acoustic  components  wliilc  ideally  maintaining  a  iliagonal  mass 
coefficient  for  computational  considerations. 
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The  present  paper  can  be  viewed  as  an  initiad  attempt  to  fill  this  void  so  that  a  method  that 
can  eventually  lead  to  adequate  finite  element  modeling  of  the  acoustic  range  frequencies 
and  their  mode  shapes.  To  this  end,  we  retain  the  two  conventional  model  improvement 
techniques,  viz.,  the  h  and  p-refinements,  but  not  to  their  extreme.  We  introduce  a  third 
component,  which  first  breaks  down  the  elemented  attributes,  parameterizes  those  decom¬ 
posed  attributes,  and  then  recombines  them  based  on  a  discrete  Fourier  synthesis  so  that 
the  discrete  characteristic  dispersion  curves  match,  for  a  specified  range  of  frequencies,  tis 
closely  as  possible  to  those  of  the  continuum  case,  hence  the  name  frequency-window  tai¬ 
loring  technique.  The  elemental  attributes  usually  consist  of  mass  matrices  of  linear  and 
quadratic  interpolations,  stiffness  matrices  of  membrane,  bending  and  transverse  shear 
components  of  constant  and  lineax  strain  interpolations.  It  will  be  subsequently  shown 
that  the  piesent  frequency-window  tailoring  technique  yields  dramatically  improved  vibra¬ 
tion  analysis  performance  beyond  either  of  the  traditional  methods  for  the  same  mesh  size, 
thus  providing  an  adequate  accuracy  with  an  affordable  mesh  refinement.  The  rest  of  the 
paper  is  ogranized  as  follows. 

Section  2  briefly  reviews  the  discrete  Fourier  analysis  technique  [13,14]  as  applied  to  two- 
noded  and  three-noded  bar  elements.  The  discrete  dispersion  curves  are  then  compared 
with  that  of  the  continuum  cast'.  The  establishment  of  this  comparison  forms  the  basis 
for  the  present  element  synthesis  or  frequency -window  tailoring  technique.  In  addition, 
a  similar  discrete  Fourier  analysis  of  a  Timoshenko  beam  and  its  correlation  with  the 
continuum  case  is  presented.  Thus,  discretization  accuracy  of  not  only  the  membrane  but 
edso  the  bending  and  transverse  shear  phenomena  can  be  assessed  in  a  quantitive  manner. 
Of  particular  interest  from  these  analyses  is  the  appearance  of  a  pronounced  jump  in  the 
frequency  for  the  case  of  the  quadratic  bar  element,  and  also  for  the  case  of  the  quadratic 
Timoshenko  beam,  although  not  as  much  pronounced.  These  jumps  occur  at  the  mode 
shape  that  corresponds  to  =  i:  12  where  k  is  the  wave  number  and  £  is  the  elemental 
length,  which  is  clearly  at  the  admissible  wave  number  range. 

A  parameterized  tailoring  of  the  bar  element  discretization  is  described  in  Section  3.  For 
the  case  of  the  bar,  a  diagonal  mass  is  first  constructed  as  a  linear  combination  of  the  linear 
and  quadratic  elements.  A  parameterized  stiffness  matrix  is  then  constructed  in  a  similar 
manner.  These  two  parameters  are  then  determined  by  requiring,  for  a  specified  range  of 
frequencies,  the  discrete  frequencies  as  close  as  possible  to  those  of  the  continuum  case. 
Encouraged  by  the  success  of  the  bar  synthesis,  a  similar  synthesis  technique  is  applied  to 
beam  elements.  This  is  carried  out  by  introducing  a  three-parameter  optimization  process, 
viz.,  the  mass  parameter,  the  bending  parameter  and  the  transverse  shear  parameter. 
These  results  are  reported  in  Section  4.  Finally,  discussions  and  some  concluding  remarks 
Eire  offered  in  Section  5. 
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2.0  Discrete  Fourier  Analysis  in  Vibration  of  Finite  Elexnents 

The  basic  analysis  tool  that  we  are  about  to  employ  throughout  the  paper  is  the  discrete 
Fourier  method  [12-14].  There  are  two  important  properties  of  the  discrete  Fourier  method 
that  are  attractive  for  the  present  purposes:  symbolic  representation  of  the  element  at¬ 
tributes  and  the  direct  comparison  of  the  discrete  dispersion  curves  with  the  corresponding 
continuum  ones.  We  now  illustrate  the  method,  for  the  saJce  of  clarity  and  simplicity,  by 
way  of  one-dimensional  bar  problems. 


2.1  Fourier  Analysis  of  Bar  Elements 


Consider  the  governing  partial  differential  equation  for  a  uniform  elastic  bar  given  by 


d'^u  „d‘^u 


(1) 


where  p  is  the  meiss  density,  E  is  Young’s  modulus,  u  is  the  axial  displacement  variable, 
and  t  and  x  are  time  and  the  axial  position  along  the  bar,  respectively.  The  Fourier 
transformation  of  (1)  can  be  performed  by  introducing  the  following  form  of  u: 

u  =  .  (2) 


which,  when  substituted  into  (1),  yields  its  characteristic  equation  as 

=  (3) 

where  I  is  a  characteristic  length,  k  is  the  wave  number,  c  is  the  continuum  wave  speed 
equal  to  y/WJp,  and  zmd  {kl)  are  the  normalized  (nondimensional)  frequency  and  wave 
number,  respectively.  The  characteristic  equation  (3)  implies  that,  for  the  continuum  case, 
the  normalized  frequency  u;//c  is  linearly  proportional  to  the  nondimensional  wave  number 

m 

Let  us  now  consider  a  two-noded  linear  bar  element.  Assembling  a  uniform  mesh  of  two 
bar  elements  of  length  I  (see  Figure  1),  we  obtain  the  discrete  equation  at  the  interior  node 
m,  as 

EA 

pAlUm  =  —  (um-l  -  +  Utn+l)  (4) 

The  discrete  solution  analogous  to  (2)  is  of  the  form 

ui(t)  =  (5) 
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which,  when  applied  to  (4),  yields  the  following  discrete  characteristic  equation: 


I 


\ 


where  the  discrete  wave  number  k  is  defined  by 


k  = 


siny 

1 

2 


0  <  kl  <  TT 


(6) 


(7) 


Hence,  comparing  (C)  with  (3)  it  is  observed  that  the  effect  of  discretization  is  embodied 
in  the  approximation  of  the  continuum  wave  number  k  by  the  discrete  wave  number  k. 


In  order  to  extend  the  above  anedysis  applicable  to  higher*order  element  interpolations,  we 
generalize  the  discrete  Fourier  analysis  as  follows.  Instead  of  representing  the  displacements 
of  adjacent  nodes  by  (5),  the  Fourier  expansion  is  assumed  to  hold  only  for  alternate  nodes 
(see  Figure  1),  and  so  consider  the  two  coupled  difference  equations  at  nodes  m  —  1  and 
m: 


pAlUm-l  = 
pAlilm  = 


EA 

I 

EA 

I 


iUm-2 
(Wm— 1 


1  "b  Ujn') 


—  2Um  +  «m+l) 


(8) 


Substituting  (5)  into  (8)  together  with  the  expressions 


Um—l  —  ^  ‘^m—1 

„  _  ^2ikr, 

Uxfi—2  — 


2 

V-Tn  —  W  V-YTi 

^m+l  —  6  ^m— 1 


(9) 


we  obtain  the  following  Fourier- transformed  equation: 

L{k,uj)u  =  0 


where 


L(fc,a;) 


2  -  iff  -  (1  +  6^“') 
.-(1  +  6-^  )  2-iif 

rj> 

U  =  {U,„_1  Um} 


(10) 


(11) 

(12) 


The  characteristic  equation  is  found  by  requiring  a  non-trivial  solution  to  (10),  viz., 
det  L  =  0,  from  which  two  characteristic  roots  are  found  as 
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(13) 


The  first  root  clearly  agrees  with  that  of  the  case  of  single  interior-node  equation  (6), 
whereas  the  second  does  not  appear  to  be  consistent  with  the  physics  of  the  problem.  On 
a  closer  examination,  however,  it  can  be  shown  that  the  second  root  corresponds  nothing 
but  to  the  TT-pheise-shifted  case  of  (6).  This  can  be  explained  as  follows. 

Note  that,  in  terms  of  their  eigenvectors,  the  first  root  is  associated  with  Um-i  = 
while  the  second  with  Um-i  =  Therefore,  the  proper  wave  number  for  the 

second  root  is 


so  that 


(kl)^  =  TT  -  H  ,  0<kl<^ 


(14) 


(15) 


The  preceding  analysis  in  terms  of  two  coupled  difference  equations  enables  us  to  properly 
interprets  the  multiple  characteristic  roots  associated  with  high-order  elements.  We  are 
now  in  a  position  to  take  on  the  discrete  Fourier  analysis  of  quadratic  bar  elements. 


The  discretization  of  the  bar  equation  (1)  by  the  quadratic  elements  (see  Figure  2)  yields 
the  following  two  coupled  difference  equations: 


EA 

pAliim—l  —  j  (Um— 2 

■  2  ~^Tn  —  (—^m— 2  "b  1  l^Um  "b  ~  '^Tn+2) 


(16) 


where  the  first  is  for  the  mid-node  and  the  second  for  the  end  node  and  a  diagonal  mass 
matrix  is  used.  The  discrete  Fourier-transformed  operator  L  for  the  above  3-noded  bar 
equations  and  the  resulting  characteristic  equation  can  be  derived  as: 


L^{k,u) 


2 -(f)' 

_  (1 +,e-2.-^/)  1  + 


-(l-be^*'-') 

C03  2kl  _  1 

4  2  W  y  . 


(17) 


_  11+C03  2H  ^i^y 


-f  3(1  —  cos2^'/)  =  0 


(18) 
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Figure  3  compares  the  dispersion  curves  obtained  for  the  linear  and  quadratic  elements, 
as  compared  with  that  of  the  corresponding  continuum  equation  (1).  By  invoking  the 
same  interpretation  discussed  in  conjunction  with  the  double  roots  given  by  (13)  for  the 
linear  element,  the  upper  root  of  the  quadratic  element  is  plotted  versus  the  redefined  wave 
number  ir  —  kl.  It  is  clear  from  the  results  that  the  quadratic  element  performs  better  than 
its  linear  counterpart  for  an  equivalent  mesh  size. 

It  is  also  noted,  however,  that  the  quadratic  element  gives  rise  to  a  discontinuity  in  the  wave 
number/frequency  relation  at  kl  =  7r/2.  That  is,  there  is  a  range  of  frequencies  the  discrete 
model  will  simply  “skip”  over.  As  this  discrete  “forbidden”  zone  occurs  in  the  middle  of 
tae  spectrum  of  discrete  behavior  for  the  element  mesh,  it  can  be  of  particular  concern 
and  importance  to  acoustics  and  wave  propagation  analyses.  Hughes  [15],  in  studying 
mass  matrix  formulations  and  their  effect  on  low  frequency  convergence  for  course  meshes, 
produced  numerical  results  consistent  with  the  discrete  Fourier  analysis  for  quadratic  bar 
and  beam  elements. 

Figure  4  illustrates  how  dramatic  this  behavior  can  become  for  the  solution  of  transient 
dynamics  problems  using  quadratic  bar  elements.  It  shows  the  power  spectral  density  of 
a  nodal  displacement  response  for  a  random  initial  displacement  input,  using  a  uniform 
mesh  of  25  quadratic  (3-node)  bar  elements  and  an  implicit  mid-point  time  integration 
algorithm.  The  “forbidden”  discrete  frequency  zone  in  the  range  of  frequences  between 
1,45  eind  1.75  verifies  numerically  the  Fourier  analysis  results  for  the  quadratic  bar  element 
shown  in  Figure  3.  Such  forbidden  discrete  frequency  zones,  if  they  persist  for  quadratic 
elements  such  as  6  and  9-node  shell  elements,  10-node  tetralrcdral  solids,  and  so  forth,  will 
have  a  profound  effect  on  their  ability  to  capture  accurately  high-frequency  responses. 
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2.2  Discrete  Fourier  Analysis  of  Timoshenko  Beam  Elements 


The  strong  form  for  the  transverse  beam  vibration  problem  is  the  following  set  of  coupled 
differential  equations 

.d'^w  ^  f  d9\ 

"  di)  ,  , 

dt^  ^^\dx  V 


where  A  and  I  are  the  cross-sectional  area  and  moment  of  inertia,  G  is  the  shear  modulus, 
and  w  and  9  axe  the  transverse  displacement  and  generalized  rotation,  respectively.  The 
Fourier  solution  of  (19)  is  assumed  to  be  of  the  form 


where 


leads  to  the  transformed  system 


u  =  [to  9\^ 

Uo  =  [too 

too  =  to(i  =  0,t  =  0) 
9o  =  9{x  =  0,  <  =  0) 


L(u;,fc)uo  =  0 


with  the  continuum  Fourier  matrix  operator,  L(a;,  k),  given  by 


where 


c  =  i/IW,  ^  7  =  I/Af 


and  the  continuum  frequency  equation  derived  from  (23)  is 


By  inspection,  there  axe  two  roots  of  (25)  for  each  wave  number  value,  and  thus  two  unique 
vibration  modes,  the  bending  and  the  shear  wave  modes,  which  are  plotted  in  Figure  5. 
The  shear  wave  is  associated  with  extremely  high  frequencies  and  not  of  primary  concern 
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to  the  dynamist;  hence  we  will  focus  on  element  tailoring  only  with  respect  to  the  bending 
mode. 


For  the  case  of  the  two-noded  linear  beam  approximation,  a  similar  procedure  jis  employed 
for  a  linear  bar  element  leads  to  the  following  discrete  Fourier  matrix  operator,  L(u;,  k), 
given  by 


-  +  X{klf  -iXlikl)y/^ 


(26) 


where  k  is  defined  by  (7)  and  a’’  is  given  by 


(kif 


(27) 


Therefore,  the  discrete  frequency  relationship  for  the  linear  Timoshenko  beam  element  is 
given  by 


+  (kiy  =  0 


(28) 


Figure  5  shows  the  resultant  linear  element  frequency /wave  number  relationships  obtained 
from  (28),  along  with  the  corresponding  continuum  results  from  (25).  Jus'  as  with  the  bar 
element,  the  lineeur  beam  element  converges  quite  well  at  low  frequencies,  then  gradually 
diverges  from  the  continuum  curve. 

Similarly,  for  the  case  of  a  three-noded  quadratic  beam  element,  utilizing  the  transverse 
shear  and  bending  stiffness  and  the  lumped  mass  matrices,  we  obtain  the  discrete  Fourier- 
transformed  matrix  operator  L  as  follows 


(ujI\  ^ 

•— J  ^{umpect 


(29) 


where 


K 


shear  — 


2A 

0 

-A  (1  +  e-2''-') 
2t7 


0 

2A 


(1  -  ^  (1  + 


-A  (1  -f 

_ iA  ^sin  2kl'^ 


yi 


•t  ) 


Af  (1  _ 

iA/(sm^) 

±(2-cos2kl) 


(30) 


^^bending  — 


0 

2 

A 

0 


0  0 
0  -A  (l-be^*^') 
0  0 

-  ^  (1  -i-  e-2'^')  0  cos  2kl) 


(31) 
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^^lumped  — 


(32) 


10  0  0 

0  10  0 

001/2  0 
00  0  1/2 

from  which  the  following  characteristic  equation  results: 


where 


Cl  =  ^(cos2A:/  -  4)  -  (A  +  j) 


^11  +  cos2fc/^ 


C2  =  3(1  —  cos2fc/)(A^  +  “  cos2fc/)  +  -(5  +  cos2kl) 

\2  I 

+  T7r-(61  +  10  cos  2kl  +  cos^  2kl)  +  -fll  +  cos  2kl)^ 

12j  4' 

C3  =  ^(cos2fc/  -  1)(11  +  cos2fc/)(A  +  7)  +  ;^(cos^  2kl  —  26cos2kl  —  47) 
2  A  2'y 

C4  =  •'.'(!  —  cos  2kl)^ 


(34) 


Figure  6  illustrates  the  characteristic  frequency  vs.  wa^*  number  relation  represented  by 
(33),  along  with  the  corresponding  continuum  curves  fivm  (25).  As  with  the  quadratic 
bar  vibration,  the  quadratic  beam  displays  the  frequency  jumping  phenomenon  in  both  its 
bending  and  shecir  wave  curves,  although  it  is  not  as  pronounced  as  in  the  case  of  membrane 
waves  approximated  by  standard  quadratic  bar  elements.  At  this  point  the  analyst  might 
conclude  that,  as  the  quadratic  beam  elements  do  not  exhibit  as  pronounced  forbidden 
discrete  frequency  zones  as  that  found  in  the  bar  element,  it  may  not  be  of  concern  for  waves 
other  than  membrcine  caises.  In  the  next  section,  however,  we  show  that,  while  tailoring  the 
element  through  parameterization  can  lead  to  overall  improvements  in  performance,  the 
analyst  must  be  careful  to  avoid  “over-optimizing”  the  higher  frequency  ranges  as  it  may 
tend  to  accentuate  the  frequency- jumping  phenomenon  inherent  in  higher-order  elements. 
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3.0  Frequency-Window  Tailoring  of  Bar  Elements 

We  have  shown  in  the  preceding  section  that  the  discrete  Fourier  analysis  can,  for  a  uniform  • 

mesh  size,  give  an  analy tic/ symbolic  characterization  of  the  finite  element  discretizations. 

For  vibration  analysis  purposes,  these  characterizations  can  then  be  used  to  assess  quantita-  <’<' 

tively  the  discretization  accuracy  as  they  can  be  directly  compared  with  the  corresponding  I 

continuum  characteristics,  viz.,  wave  dispersion  characteristics.  Thus,  the  discrete  Fourier  | 

analysis  technique  can  be  applied  not  only  for  the  prediction  of  the  resulting  discrctiza-  I 

ticn  for  vibration  analysis,  but  more  importantly  for  the  tailoring  of  elemen  attributes  I 

in  order  to  improve  the  finite  element  model  accuracy.  For  example,  it  was  si; own  in  [12]  I 

that,  for  a  given  element  stiffness  matrix,  a  tailored  mass  matrix  as  a  linear  combination  | 

of  the  lumped  and  consistent  mass  matricies  can  significantly  improve  the  ccuracy  of  f 

the  low  frequencies  and  their  mode  shapes.  However,  little  improvement  can  e  made  for  I 

high-frequency  components  only  by  mass-matrix  tailoring.  | 

[ 

In  order  to  improve  the  accuracy  of  the  finite  element  models  for  the  high-frequency  com-  I 

ponents,  we  are  motivated  to  tailor  the  element  stiffness  matrices  for  a  given  nodal  pattern 
in  addition  to  mass  tailoring.  A  simple  case  to  test  such  a  stiffness  tailoring  concept  is 
to  employ  a  three-noded  discrete  nodal  pattern  for  a  bar  so  that  one  can  work  with  two  | 

discrete  element  stiffness  matrices:  a  3  X  3  quadratic  clement  stiffness  matrix  and  the  as-  I 

sc:  r-ibled  stiffness  matrix  of  two  linear  elements.  Hence,  we  obtain  the  following  three-noded 
t .  /lored  element  mass  and  stiffness  matrices: 

M  =  +  {l- 

K  =  -f  (1  -  aOK'’"*"’’ 

where  and  are  the  assembled  matrices  of  two  linear  elements,  and  Om  and  i 

cvjt  are  coefficients  to  be  determined. 


11 


3.1  Discrete  Fourier  Analysis  of  Tailored  Three-Noded  Bar  Element 


The  governing  nodcil  difference  equations  at  the  mid-span  node  m  —  1  and  the  adjacent 
end  node  m  are  given  by  (see  Figure  2) 


—(3  +  =  (3  -f  ttjt) 


Um-2  -  m— 1  +  Ur 

/2 


P  (n  fo  \  \  [ ‘^U-m  "1*  Uni-fl  "N  /  Um—2  "b  \ 

-(3  -  =  (3  -b  Qk)  I  - ( - 2/2 - ) 


(36) 


The  Fourier-transformed  discrete  operator  L  for  the  tailored  bar  with  the  embedded  per¬ 
formance  parameters  am  and  ak  can  be  derived  as 


L{k,oj) 


6  +  -  (3  +  a„)  (f )"  -(3  +  at)  (1  + 

-(3  +  at)  (1  +  6  +  at(l  +  cos  2ld)  -(S-a„)(ff 


(37) 


The  chciracteristic  equation  and  its  roots  are  then  found  to  be 


Cl 


+  C3  =  0 


(38) 


where 

:  i  =  9  -  or^ 

C2  ™  (3  -  Qf„,)(6  -b  2ajt)  -b  (3  -b  Om)  (6  +  ^^(l  +  cos  21*/)) 

C3  ~  (6  -b  2ajt)  (6  -b  0]k(l  +  cos  21*/))  -  2(3  -b  +  cos  2kl) 


2 

1 

2 

2 


C2  -  y/c^-4ciC3 
2c: 

C2+  \/c2  -4ciC3 
2ci 


(39) 


(40) 
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3.2  Frequency-Window  Tailoring  of  the  Bar  Element 


Our  objective  in  tailoring  the  bar  element  is  to  minimize  the  error  between  the  exact  and 
discrete  eigenvalues  in  an  average  sense  over  a  finite  range  of  the  frequency  spectrum. 
To  this  end,  for  each  frequency-window  range  a  <  kl  <  b,  we  perform  the  following 
minimization: 


min 


1- 


c  ^discrete 


—Y  J 

c  /  exact  -» 


d(kl) 


(41) 


subject  to  (3),  (39),  tind  (40). 


The  optimization  problem  posed  by  (41)  is  not  easily  solvable  in  a  closed-form  sense, 
however,  and  since  this  element  presents  the  simpliest  form  of  Fourier  equations  that 
are  of  practical  interest,  we  wish  to  consider  a  numerical  complement  to  (41)  which  can 
be  addressed  by  a  standmd  quadratic  optimzation  technique.  Therefore,  the  frequency- 
window  tailoring  method  is  recast  as 


min  J 


where 


^=E 


i=0 


J  _  V  c  /discrete 


if) 


2 

exact 


A/=a+i(6-a) 


(42) 


(43) 


subject  to  (3),  (39),  and  (40). 

Numerical  optimization  of  (42)  was  accomplished  using  quasi-Newton  methods  with  cen¬ 
tred  difference  derivative  approximations  [16].  Both  DFP  and  BFGS  formulae  were  utilized 
in  approximating  the  inverse  Hessian  matrix,  though  their  performance  was  generally  com¬ 
parable. 
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3.3  Tailoring  Results  for  the  Bar  Element 


Twelve  frequency  windows  were  considered  for  teiiloring  of  the  bar  element.  The  first  six 
cover  the  wave  number  range  of  0  <  A:/  <  n-  in  even  intervals;  the  results  can  be  found  in 
Table  1.  The  last  six  frequency  windows  cover  progressively  wider  ranges  of  kl  culminating 
in  the  final  result  which  tailors  Om  and  for  the  entire  element  spectral  range;  these 
results  axe  shown  in  Table  2.  In  ail  cases,  the  numerical  optimization  used  the  quadratic 
element  parameters  as  a  starting  point,  and  the  initial  and  final  values  of  J  are  noted. 
Figures  7  through  14  illustrate  a  selection  of  these  results,  along  with  the  corresponding 
curves  for  the  continuum  equation  and  the  discrete  linear  and  quadratic  elements.  In 
all  cases,  the  tailcring  method  produces  mau-ked  improvements  for  each  specified  window, 
most  dramatically  for  high-frequency  and/or  high- wave  number  ranges. 


4.0  Frequency- Window  Tailoring  of  Beam  Elements 


Encouraged  by  the  improved  accuracy  of  the  tciilored  three-noded  bar  elements  for  window- 
by- window  frequency  ranges,  this  section  extends  the  basic  tailoring  procedure  to  the  Tim¬ 
oshenko  beam  element.  In  doing  so,  we  adopt  the  same  three-noded  beam  eL.aent  nodal 
pattern  as  in  the  case  of  the  tailored  bar  element  with  one  important  additional  feature. 
The  tailored  element  stiffness  matrix  consists  of  the  teiilored  bending  and  the  tailored 
transverse  shear  contributions,  each  of  which  in  turn  consists  of  a  linear  combination  of  a 
quadratic  and  a  corresponding  assembled  two-element  linem  component,  viz., 


K 

Kj/iear 

^bending 

^'^lumped 


"b  ^bending 


+  (1  - 
+  (1  - 
''  +  (1  - 


(44) 


14 


4.1  Tailoring  Results  for  Beam  Elements 

The  tailoring  method  proceeds  as  with  the  bar  element  by  a  discrete  Fourier  synthesis  of 
the  elemental  eigenproblem,  incorporating  the  resulting  equations  into  the  performance 
index  (43),  and  a  numericaJ  optimization  of  the  performance  index  to  obtain  the  tailoring 
parameters  orj,  eind  am-  As  noted  in  Section  2.2,  the  frequency-window  tailoring  has 
focused  only  on  the  bending  curve  waves.  If  necessary,  a  shear  wave  tailoring  can  be 
performed  zis  well. 

Results  of  severed  frequency-window  tailoring  cair  be  found  in  Table  3  and  Figures  11 
through  14.  Case  I  summarizes  the  tailored  mass  and  stiffness  parameters  for  the  range 
of  ^  <  A:/  <  y.  As  can  be  observed  in  Fig.  12,  the  accuracy  improvement  by  the  tailored 
element  is  rather  dramatic.  Errors  of  up  to  6%  in  predicted  frequencies  from  the  nominal 
linear  and  quadratic  elements  are  reduced  to  less  than  0.5%  over  this  entire  low  frequency 
window.  In  addition.  Figure  11  illustrates  how  the  tailored  element  zdso  exhibits  a  marked 
improvement  in  frequency  prediction  over  the  full  discrete  frequency  range,  far  beyond  the 
designated  “design”  window. 

Case  II  lists  the  tailoring  of  mass  and  stiffness  matrices  performed  in  the  high- wave  number 
range  of  ^  <  kl  <  ^.  As  Figure  13  illustrates,  however,  convergence  within  the  desired 
spectrum  does  not  imply  superior  overall  behavior  (the  limits  of  the  tailored  spectrum  are 
shown  as  vertical  lines  in  the  figure).  Here,  the  optimized  parameters  tend  to  accentuate 
the  forbidden  discrete  frequency  zone,  and  allows  significant  errors  in  the  low  to  middle 
spectrum  that  was  optimized  in  Case  I.  These  errors  are  as  much  as  60%  with  respect 
to  the  exact  continuum  solution  in  this  mea  of  the  spectrum.  Clearly,  the  influence  of 
this  “forbidden  zone”  is  significant  enough  to  warrant  careful  attention  to  the  selection  of 
frequency  optimization  windows  eind  performance  indicies. 

In  addition  to  the  aforementioned  problems  in  low  frequency  ranges,  the  unconstrained 
optimization  of  Case  II  resulted  in  parameters  which  lead  to  a  negative-definite  element 
stiffness  matrix.  Thus,  the  performance  exhibited  by  the  dispersion  curves  in  Figure  13 
is  not  practically  realizable.  Case  Ila  used  the  unconstrained  optimization  results  as  a 
starting  point,  then  relaxed  the  stiffness  tailoring  parameters  to  regain  the  correct  positive 
definite  character  of  the  element  stiffness  while  retaining  much  of  the  improved  element 
behavior  in  the  desired  frequency  window.  The  issues  of  element  feasibility  and  tailoring 
parameter  constraints  are  covered  in  more  detail  in  Section  4.2. 

The  final  case  studied,  Case  III,  is  an  attempt  to  optimize  the  overall  element  behavior, 
producing  reasonably  small  errors  in  the  middle  to  high  spectrum  while  maintaining  ac¬ 
curacy  m  the  low  spectrum.  The  result  (see  Figure  14)  shows  that  the  magnitude  of  the 
forbidden  zone  has  been  reduced  and  the  accuracy  improved  to  within  ±10%  for  the  range 
of  Q  <  kl  <  while  the  errors  of  both  the  standard  linear  and  quadratic  elements  are 
rapidly  increasing  as  kl  increases. 
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4.2  Tailoring  Parameter  Constraints 


So  far,  we  have  used  unconstrained  optimization  techniques  to  determine  appropriate  ele¬ 
ment  tailoring  parameters,  by  ignoring  contrciints  that  must  be  imposed  on  these  variables 
to  ensure  element  feasibility.  Two  important  considerations  are  the  preservation  of  ele¬ 
ment  rigid-body  modes  emd  total  mass.  By  using  convex  combinations  of  the  linear  and 
quadratic  element  formulations,  it  is  easy  to  show  that  these  are  maintained  for  arbitrary 
values  of  the  element  performance  parameters.  In  other  words,  these  properties  are  pre¬ 
served  automatically.  However,  we  must  also  preserve  the  positive  definiteness  of  the  mass 
matrix,  and  positive  semi-definiteness  of  the  stiffness  matrix,  and  ensure  the  optimization 
does  not  introduce  spurious  element  mechzinisms. 


For  the  bzir  element,  these  constraints  result  in  the  following  parameter  restrictions: 


|Qr„,l  <  3 
ock  >  -3 


(45) 


Practically  speaking,  the  restriction  on  the  stiffness  teiiloring  parameter  is  superfulous  if 
the  element  is  optimized  over  a  reasonably  broad  range  of  its  spectrum,  zmd  the  Fourier 
analysis  can  be  used  a  posteriori  to  examine  the  tailored  element  behavior,  as  it  is  a  robust 
method  for  identifying  deficiencies  in  the  element  formulation  [17].  In  other  words,  if  we 
enforce  the  tailored  mass  matrix  to  be  positive  definite  through  an  inequality  constraint  on 
ajnt  and  the  Fourier  analysis  identifies  the  vibration  behavior  of  the  element  as  acceptable 
over  its  full  spectral  range,  there  is  no  need  to  constrain  the  stiffness  tedloring  parameters. 

It  should  be  noted,  however,  that  while  the  Fourier  analysis  will  identify  problematic 
behavior  in  the  element  formulation,  the  evidence  can  be  subtle  and  easily  missed  when 
examining  the  dispersion  curve  at  a  finite  set  of  discrete  wave  number  values.  For  example, 
in  Section  4.1  it  was  noted  that  the  optimization  performed  in  Case  II  lead  to  a  negative- 
definite  element  stiffness  matrix.  However,  the  negative  eigenvalues  found  by  examining 
the  Fourier  dispersion  curve  for  the  tailored  element  were  in  the  range  of  0  <  kl  < 
in  other  words  in  the  lowest  .01%  of  the  element  spectrum.  A  more  reliable  method 
(given  the  difficulty  in  deriving  a  general  parEuneter  constraint  for  elements  more  complex 
than  bars)  would  to  check  the  positive  semi-definiteness  of  the  element  stiffness  matrix  at 
each  step  in  the  optimization  process,  and  then  limit  the  step  size  of  the  iteration  if  the 
semi-definiteness  stiffness  constraint  is  violated. 

With  this  in  mind,  the  beam  element  parameter  constraint  is  given  by 

M  <  3  (46) 


By  using  a  lumped  mass  formulation,  it  should  be  simple  to  determine  the  mass  param¬ 
eter  constraint  even  in  the  case  of  complex  two  and  three-dimensional  elements.  The 
computational  overhead  associated  with  diccking  the  clement  stiffncs.s  inatri;<  for  negative 


eigenvalues  is  very  small  compared  to  the  basic  optimization,  and  certainly  much  eas¬ 
ier  than  deriving  the  dispersion  curve  at  a  sufficient  nurrber  of  values  to  ensure  element 
feasibility. 


4.3  Numerical  Results  for  Tailored  Finite  Element  Models 

A  final  issue  which  must  be  addressed  is  the  resultant  accuracy  of  assembled  finite  ele¬ 
ment  models  ’  'mg  the  tailoring  methods  described  herein,  both  in  terms  of  frequencies 
and  mode  shapes.  The  primary  utility  of  the  Fourier  analysis  within  the  context  of  the 
frequency- window  tailoring  technique  is  to  closely  correlate  the  temporal/spacial  frequency 
characteristics  of  the  parameterized  element  to  the  corresponding  partial  differential  equa¬ 
tions  of  motion.  However,  if  the  eigenvectors  of  an  assembled  model  do  not  correlate  well 
to  the  associated  continuum  wave  shapes  over  the  discrete  spectrum,  the  model  cannot 
be  assumed  to  accurately  capture  the  continuum  wave  dynamics.  Fortunately,  through 
numerical  tests,  it  can  be  verified  that  the  tailored  elements  maintain  the  mode  shape 
characteristics  of  the  linear  and  quadratic  elements  on  which  they  are  based. 

For  the  bar  vibration  problem,  with  constrained  ends,  the  wave  shapes  follow  the  Fourier 
spacial  expansion  used  in  the  tailoring  method.  That  is 

<f)n{x)  =  n  =  l,2,...  (47) 

Figure  15  demonstrates  how  the  Fourier  uispersion  curve  accurately  predicts  the  resultant 
element’s  numerical  behavior.  The  discrete  points  shown  are  the  calculated  frequencies 
of  a  lO-eleniCnt  mesh  of  tmlored  beirs,  with  the  given  parameters,  plotted  against  the 
corresponding  wave  number  eis  determined  by  best-fitting  the  eigenvector  of  the  mode  to 
the  Fourier  spacial  expansion.  Not  only  is  the  appicability  of  the  Fourier  analysis  verified, 
but  the  results  also  show  how  the  discrete  model  eigenmodes  uniformally  cover  the  discrete 
spectrum,  as  is  the  case  with  lineeir  bar  elements.  Figures  16  and  17  show  that,  for  both 
moderate  and  high  frequency  modes,  the  eigenvectors  of  the  tailored  model  retain  the  same 
basic  character  as  their  linear  element  counterparts,  which  themselves  exactly  match  the 
continuum  wave  shapes  at  the  node  point  locations.  Figures  18  through  20  show  similar 
results  for  a  high-frequency-windov/  tailoring  case. 
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5.0  Discussions 


A  frequency-window  tadoring  technique  is  preser  .1  for  improving  finite  element  models 
that  cEin  be  used  to  capture  accurately  frequencies  jid  their  mode  shapes  up  to  acoustic 
ranges.  The  tailoring  of  the  element  mass  and  stiffness  matrices  is  achieved  by  a  combi¬ 
nation  of  linear  emd  quadratic  elements  for  both  bar  and  beam  elements.  The  tailoring 
pzurameters  are  optimized  for  each  frequency  window  by  minimizing  the  errors  between  the 
continuum  and  discrete  characteristic  dispersion  curves.  For  the  case  of  beam  elements, 
the  stiffness  tailoring  is  further  partitioned  into  a  component-by-component  contribution  of 
the  transverse  shear  and  the  bending  stiffness  matrices.  Such  a  component-by-component 
tailoring  has  proved  to  be  a  key  feature  of  the  present  tailoring  technique. 

The  accuracy  improvements  realized  by  the  present  tailoring  technique  have  been  first 
predicted  by  the  discrete  Fourier  cmalysis.  The  results  demonstrate  that  the  tailored 
elements  drzimatically  improves  the  accuracy  of  both  the  frequencies  amd  their  mode  shapes 
far  beyond  that  of  the  linear  or  quadratic  elements.  This  is  also  coroborated  via  numerical 
eigenvalue  analyses.  Although  the  results  contained  herein  are  primarily  the  analytical 
and  numerical  eigenvalues  of  the  various  element  formulations,  it  should  be  remember  that 
the  fundamental  product  of  the  tailoring  methods  are  the  stiffness  and  mass  interpolation 
parameters  themselves,  which  aire  a  function  of  the  frequency  window  chosen  by  the  analyst 
to  optimize. 

There  are  two  overhead  aspects  associated  with  the  present  tailoring  technique.  The  first  is 
to  employ  a  quadratic  nodal  topology  for  constructing  the  tailored  element  matrices.  The 
second  is  to  ceirry  out  severed  frequency  window-by-window  eigenvalue  analyses  in  order 
to  cove  the  entire  range  of  frequencies  of  interest.  These  must  be  more  than  made  up  by 
a  substeintieJly  reduced  number  of  the  nodal  degrees  of  freedom  by  the  present  tailored 
elements.  For  exaunple,  for  the  wave-number  range  of  j  <  fc/  <  tt,  the  tailored  bar  and 
beam  elements  yield  the  frequency  accuracy  within  a  few  percent  for  kl  =  2.5  from  Fig.  13, 
whereas  the  conventional  linear  and  quadratic  beam  elements  must  reduce  their  clement 
sizes  by  a  factor  of  about  ten  and  five,  respectively,  if  the  same  accuracy  is  to  be  maintained 
by  these  elements. 

A  straightforward  extrapolation  of  the  preceding  accuracy  comparisrn  to  two  and  three 
dimensional  problems  would  result  in  100  and  25  smaller  nodal  unknowns  when  the  present 
technique  is  employed.  As  the  extension  of  the  present  tailoring  technique  to  plate,  shell 
and  solid  elements  appear  to  be  strciightforward,  a  bigger  pay-off  of  the  present  technique 
may  accrue  as  applied  to  two  and  three  dimensional  elements.  For  example,  for  shell 
elements,  the  tailoring  of  element  stiffness  can  be  achieved  by  synthesizing  the  membrane, 
the  transverse  shecir,  and  the  bending  components.  This  is  being  carried  out  at  present 
eind  we  plrin  to  report  the  results  in  a.  future  occasion. 
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Case 

Window 

Ok 

Jq 

Jhp 

I 

0  <  i'/  <  f 

0.9990 

1.0263 

1.23  X  10"® 

0.10  X  10“® 

II 

f  <  <  f 

0.4948 

1.1840 

8.44  X  10-* 

0.19  X  10-'‘ 

III 

f  <fc/<f 

0.5409 

1.3188 

0.1018 

4.60  X  lO""* 

IV 

0.6200 

1.5631 

0.1289 

0.0041 

V 

VI 

VI 

0.7439 

1.9880 

0.2729 

0.0216 

VI 

^  <  fc/  <  TT 

0.9435 

2.7517 

1.4866 

0.0851 

Table  1:  Teiiloring  Results  for  Bcir  Element  (Narrow  Frequency  Windows) 


Case 

Window 

OCm 

OCk 

Jq 

Jhp 

VII 

0  <  ki  <  f 

0.4948 

1.1837 

9.32  X  10-^ 

0.25  X  10"'^ 

VIII 

f 

0.6539 

1.4629 

0.2136 

0.0124 

IX 

'f<kl<Tr 

0.8505 

2.3723 

1.7977 

0.3571 

X 

Q<kl<^ 

0.5577 

1.2887 

0.1385 

9.85  X  10“'* 

XI 

J  <  kl  <  TT 

0.8005 

2.1897 

1.9782 

0.6210 

XII 

0  <kl  <TT 

0.7804 

2.1101 

2.0701 

0.7615 

Table  2:  Tedloring  Results  for  Bar  Element  (Broad  Freqm  zy  Windows) 
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Case 

Oim 

OCk 

OCb 

Jq 

Jhp 

I 

f  <fc^<f 

0.6692 

1.0018 

0.6244 

0.0048 

5.1801  X  10"® 

II 

1.5321 

1.0854 

-0.2374 

32.7325 

1.1998  X  10-“* 

VI 

VI 

1.5321 

1.0800 

0.5000 

32.7325 

0.1745 

III 

0<kl<^ 

0.8508 

1.0331 

0.6685 

0.7713 

0.0653 

Table  3:  Tailoring  Results  for  Beam  Element 
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Figure  1:  Nodal  Geometry  for  Linear  2-node  Line  Elements 


m-l 


m+1 


Figure  2:  Nodal  Geometry  for  Quadratic  3-node  Line  Elements 
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Normalized  Frequency,  (wl/c) 


Normalized  Wave  Number,  kl 


Figure  3:  Fourier  Analysis  Results  for  Bar  Elements 
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Frequency,  (wl/c) 


Figure  5:  Fourier  Analysis  Results  for  Linear  Beam  Element 
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Nomalizcd  Frequency,  (wl/c) 
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Figure  6;  Fourier  Analysis  Results  for  Quadratic  Beam  Element 
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Normalized  Frequency,  (wl/c) 


Figure  7:  Fourier  Analysis,  Tailored  Beir  Element,  Case  III 
Tailored  Wave  Range:  ^  <  kl  <  j 
Parameters:  =  0.5409,  a,  =  1.3188 
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Percentage  Frequency  Error 


1.1  1.2  1.3  1.4  1.5 


Normalized  Wave  Number,  kl 

Figure  8:  Frequency  Error,  Tailored  Bar  Element,  Case  III 
Tailored  Wave  Range:  f  <  fc/  <  f 
Parameters:  am  =  0.5409,  a^  =  1.3188 
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Normalized  Wave  Number,  kl 


Figure  9:  Frequency  Error,  Tailored  Bar  Element,  Case  VI 
Tailored  Wave  Range:  ^  <kl  <7: 
Parzuneters:  am  =  0.9435,  a*  =  2.7517 
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Figure  10: 


1.4  1.6  1.8  2 

Normalized  Wave  Number,  kl 


Frequency  Error,  Tailored  Bar  Element,  Case  VIII 
Tailored  Wave  Range:  j  <kl 
Parameters:  am  =  0.6539,  0;;^  =  1.4629 
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Figure  11;  Fourier  Analysis,  Tailored  Beam  Element,  Case  I 
Tailored  Wave  Range:  f  <  kl  <  j 
Pairameters:  am  =  0.6692,  a,  =  1.0018,  at  =  0.6244 
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Figure  12:  Frequency  Error,  Tailored  Beam  Element,  Case  I 
Tailored  Wave  Range:  f  <  fcl  <  f 
Parameters:  =  0.6692,  a,  =  1.0018,  aj  =  0.6244 
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Figure  13:  Frequency  Error,  Tailored  Beam  Element,  Case  II 
Tailored  Wave  Range:  ^  <  kl  <  ^ 

Parameters:  =  1.5321,  a,  =  1.0854,  aj  =  —0.2374 
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Figure  15:  Comparison  of  Fourier  Analysis  to  Discrete  Bar  Model,  Case  III 
Tailored  Wave  Range:  f  <  fc/  <  f 
Parameters:  am  =  0.5409,  ak  =  1.3188 
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Figure  16:  Low  Frequecy  Mode  Shape  for  Discrete  Bar  Models,  Case  III 
Tailored  Wave  Range:  f  <  fc/  <  f 
Pcurameters:  ocm  —  0.5409,  orjk  =  1.3188 
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Figure  18:  Comparison  of  Fourier  Analysis  to  Discrete  Bar  Model,  Case  VI 
Tailored  Wave  Range:  ^  <kl  <tt 
Parameters:  am  =  0.9435,  ajt  =  2.7517 


Figure  19:  Low  Frequecy  Mode  Shape  for  Discrete  Bar  Models,  Case  VI 
Tailored  Wave  Range:  ^  <kl  <% 

Parameters:  =  0.9435,  ak  =  2.7517 
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Figure  20:  High  Frequecy  Mode  Shape  for  Discrete  Bar  Models,  Case  VI 
Tailored  Wave  Range:  ^  <kl  <  it 
Parameters:  am  =  0.9435,  ak  =  2.7517 
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Figure  21:  Comparison  of  Fourier  Analysis  to  Discrete  Beam  Model,  Case  Ila 
Tailored  Wave  Range:  ^  <kl 
Parameters:  am  =  1.5321,  a,  =  1.0800,  =  0.5000 
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Abstract 

A  systematic  procedure  for  determining  the  lumped  mass 
matrix  and  improved  consistent  mass  matrices  has  been  pro¬ 
posed  for  vibration  analyses  by  the  finite  element  method. 
The  procedure  is  based  on  the  discrete  Fourier  analysis  which 
enables  one  to  compare  the  numerical  approximations  with 
the  corresponding  continuum  characteristics.  The  procedure 
is  applied  to  vibrations  of  bar.  Euler-Bernoulli  beam  and  plate 
bending  elements.  The  results  obtained  by  the  present  pro¬ 
cedure  clearly  indicate  that  a  judicious  use  of  the  improved 
mass  matrices  offered  in  the  paper  can  lead  to  a  significant 
accuracy  improvement  for  intermediate  frequencies  that  can 
play  important  roles  in  modeling  of  control-structure  interac¬ 
tion  systems,  dynamic  localizations  and  acoustic  responses 
for  space  structures  and  underwater  vehicles. 

1.  Introduction 

The  question  of  mass  lumping  or  rather  the  systematic 
construction  of  mass  matrix  for  the  vibration  and  tran¬ 
sient  analysis  of  structures  by  the  finite  element  method 
remans  to  date  an  unresolved  issue.  Apparently,  it  was 
Archer- (1963)  who  first  introduced  a  procedure  for  gener¬ 
ating  mass  matrices  based  on  the  same  displacement  nape 
functions  that  are  used  in  the  construction  of  element  stiff¬ 
ness  matrices.  The  mass  matrices  generated  according  to 
Archer’s  procedure  have  become  known  as  “consistent” 
mass  matrices.  In  contrast  to  the  consbtent  mass  matrix, 
a  diagonal  or  diagonalized  mass  matrix  is  refered  to  as  a 
lumped  mass  matrix. 

Even  though  the  use  of  consistent  mass  matrices  yields  for 
most  applications  better  accuracy  in  the  frequency  analy¬ 
sis,  the  lumped  mass  matrix  continues  to  be  prefered  by 
the  practicing  engineers  due  to  its  computational  simplic¬ 
ity  and  a  data  storage  saving  in  the  computer.  Such  attrac¬ 
tive  features  of  the  lumped  mass  matrix  motivated  several 
investigators  in  the  past  to  propose  various  mass  lumping 
procedures  such  as  a  row  sum  of  the  consistent  mass  ma¬ 
trix  (Leckie  and  Lindberg,  1963;  Tong,  Plan  and  Buciarelli, 
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1971;  Krieg  and  Key,  1973),  the  use  of  a  scaled  diagonal 
entries  from  the  stiffness  matrbe  (Hinton  et  al,  1976),  a 
selective  sum  of  a  low  order-based  consistent  mass  matrix 
(Fried  and  Malkus,  1975),  and  combinations  of  these. 

Although  existing  mass  lumping  procedures  are  intuitively 
appealing  for  low-order  elements  such  as  constant  strain 
bar,  and  beam  bending  elements,  such  intuitive  (or  ad- 
hoc)  procedures  become  quickly  ambiguous  for  high-order 
elements.  As  a  result,  at  present  no  agreed-upon  lumped 
mass  matrices  exist  for  cubic  Euler-Bemoulli  beams  and 
for  eight-noded  serendipity  plate/shell  elements.  Hence, 
there  exists  a  lack  of  a  systematic  procedure  for  mass  lump¬ 
ing. 

The  objective  of  the  present  paper  is  first  to  develop  a  sys¬ 
tematic  lumping  procedure  based  on  the  discrete  Fourier 
analysis  of  the  finite  element  method  (Park  and  Flaggs, 
1984;  Flaggs,  1988)  and  second  to  symbolically  synthesize 
a  series  of  more  accurate  mass  matrices  for  vibration  anal¬ 
ysis  when  intermediate  frequencies  become  important.  To 
this  end,  the  paper  is  organized  as  follows. 

Section  2  revisits  a  discrete  Fourier  analysb  of  a  bar  mod¬ 
eled  by  the  linear  displacement  approximation.  A  defini¬ 
tion  of  the  lumped  mass  matrix  is  proposed  by  compar¬ 
ing  the  characteristic  equation  for  the  continuum  bar  with 
that  for  the  constant-strain  bar.  A  simple  synthesis  of  an 
improved  mass  matrix  for  the  bar  is  proposed  and  its  ac¬ 
curacy  is  assessed  in  terms  of  its  wave  dispersion  curve. 
An  example  vibration  problem  with  simply-fixed  bar  ends 
is  analyzed,  which  demonstrates  the  systematic  nature  of 
the  proposed  definition  of  a  lumped  mass  matrix  and  the 
general  nature  of  discrete-Fourier  synthesized  mass  matri¬ 
ces. 

Section  3  applies  the  present  definition  of  lumped  mass 
matrices  to  a  cubic  Euler-Bernoulli  beam  element.  .A  prin¬ 
cipal  theory  from  the  analysis  of  the  cubic  Euler-Bemoulli 
beam  confirms  the  numerically  well-known  result  that  the 
lumping  of  the  translational  degree  of  freedom  (tu)  and  the 
neglect  of  the  rotational  freedom  ( ^)  yields  a  most  accu¬ 
rate  frequency  prediction.  The  present  theory  succinctly 
illustrates  that  such  a  mass  matrix  is  the  only  theoretically 
consistent  approximation.  A  problem  analyzed  by  Archer 
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is  revisited  in  order  to  asses  our  improved  mass  matrix.  It 
is  shown  that  the  proposed  synthesized  mass  matrix  con¬ 
siderably  improves  the  third  and  fotirth  frequencies  for  a 
two^lement  beam,  thus  estabibhing  the  soundness  of  the 
proposed  synthesized  mass  matrix. 

The  present  improved  mass  modeling  for  plate  vibrations 
is  presened  in  Section  4.  To  this  end,  a  Fourier  analysis  of 
the  frequency  vs.  wave  number  characteristics  is  carried 
out  for  an  infinite  plate  of  both  the  continuum  and  finite 
element  approximation  by  a  four-noded  element.  Such  a 
Fourier  analysb  b  believed  to  provide  insight  into  the  ac¬ 
curacy  of  vibration  analysb  for  an  infinite  plate.  In  oder  to 
utilize  the  mass  matrix  modeling  based  the  discrete  Fourier 
analysb,  finite  element  plate  vibrations  with  free  edges 
have  been  performed  with  increasing  meshes.  The  results 
obtained  from  the  discrete  Fourier  analysb  and  numerical 
tests  for  free-edge  plate  vibrations  indicate  that  a  best  ac¬ 
curacy  for  an  infinite  plate,  when  analyzed  by  a  four-node- 
element,  b  achieved  by  an  average  of  the  lumped  and  the 
consbtent  matrices.  For  plate  with  finite  dimensions,  a 
best  accuracy  b  achievable  with  quarter  of  the  lumped 
mass  and  three  quarter  of  the  consbtent  mass  matrix. 

2.  A  Proposition  for  Lumped  Mass  Matrix 

A  mass-matrix  lumping  procedure  that  we  are  about  to 
propose  b  based  on  the  discrete  Fourier  analysb.  Since 
an  easy  example  of  Fourier  analysb  that  one  can  perform 
b  the  centinuum  equation  for  a  bar  and  its  discrete  coun¬ 
terpart,  we  will  first  introduce  their  Fourier  analyses.  We 
will  then  identify  the  Fourier  operators  for  mass  matrices. 
In  a  simplest  term,  our  proposed  lumped  mass  matrix  b 
defined  as  follows: 

Let  k  be  the  reave  number  and  Mean  Fourier- 
transformed  consistent  mass  matrix,  then  the  lumped 
mass  matrix,  Mtamp,  >ri  the  Fourier  domain  is  defined 

Mfom?  =  Ihn  Meon  (2 ' 1) 

0 

We  will  now  illustrate  our  proposition  for  mass-lumping 
procedure  via  a  bar  clement. 

The  equation  of  motion  for  a  uniform  elastic  bar  can  be 
written  as 

where  p,E,n,x  are  the  density.  Young's  modulus,  the  dis¬ 
placement  and  the  coordinate,  respectively. 

The  traditional  Fourier  analysb  begins  by  seeking  a  general 
harmonic  wave  solution  of  Eq.  (2.2)  of  the  form 

u  =  (2  •  3) 

with  w  being  the  circular  frequency,  k,  the  corresponding 
wave  number  and  i  =  \/^.  Substitution  of  Eq.  (2.3)  into 
Eq.  (2.2)  yields 

L(w,  jfc)  -  u  =  0  (2-4) 

where  the  Fourier  operator,  L[u,k),  b  given  explicity  by 


L(w,  Jt)  =  -pw^  T-  Ei?  (2  •  5) 

For  our  subsequent  dbcussions,  we  reexpress  the  above 
equation  as: 


fX(w,Jt)_=-a;=M««t-fK, 

I  with  Macact  =  P  ■  1.  K  =  . 


A  corresponding  discrete  Fourier  analysb  can  be  per¬ 
formed  when  the  bar  equation  (2.2)  b  approximated  by  the 
finite  element  method,  which  has  been  studied  by  many  in¬ 
vestigators  (Bazant,  1978;  Belytschko  and  Mullen,  1978; 
Vichnevetsky,  1982;  Park  and  Flaggs,  1984;  Celep  and 
Tdrhan,  1987;  Flaggs,  1988).  -4  preoccupation  of  these 
studies,  however,  was  to  address  the  efiect  of  internal  en¬ 
ergy  discretizations  on  the  wave  propagation  characteris¬ 
tics. 

In  the  present  study,  we  will  examine  the  effect  of  kinetic 
energy  discretizations  on  the  frequency  characterbtics  and 
deduce  from  such  a  study  the  proposed  mass-lumping  pro¬ 
cedure  as  well  as  a  synthesb  procedure  to  obtain  improved 
consbtent  mass  matrices.  To  thb  end,  let  us  revbit  a  con¬ 
stant  bar  element.  An  elementary  finite  element  implemen¬ 
tation  gives  for  a  bar  element  of  a  uniform  length,  t,  the 
following  discrete  equation  for  an  interior  node,  m  (e.g.. 
Park  and  Flaggs,  1984): 

s 

f  (2«_i  -r  4tt„  -r  S^+i)  -t-  7x(am-i  -  2u«  -r  u^+i)  =  0 

'  P'0 

Substitution  of  Eq.  (2.3)  into  Eq.  (2.7)  yields  the  dberete 
Fourier  operator 

4-  K°  (2  -  8) 


-Mean  =  p  -  (1  -  jkT),  K°  =  EP  (2  -  9) 

in  which  the  discrete  wave  number,  is  defined  as 


-  sm  ■:r 
k  s  — ^ 


(2-10) 


For  a  lumped  mass  matrix,  we  have  the  following  discrete 
Fourier  operator: 


i|omp{w.*)  =  -W'Miamp  -r  K 

where 

I^ffomp  "  P  "  I 

Test  of  Proposed  Mass-Lumping  Procedure  (2.1): 
Note  that  he  expanded  to  read 

=P-(l  -  g*'" 


(2-11) 


(2.12) 


(2-13) 


1533 


Hence,  we,  have 


lim  IMco,,  "-I  P  *  1  —  ^^(ump  (2  *  14) 

Ai—0 

which  proves  our  proposition  (2.1)  to  be  valid  (at  least  for 
the  bar!). 

We  now  address  our  second  related  task:  a  systematic  way 
of  constructing  consistent  mass  matrices  that  can  lead  to 
improved  vibration  analysis.  To  this  end,  we  note  that  the 
characteristic  equation  that  relates  the  wave  number  (fc) 
to  the  frequency  (w)  is  obtained  by  setting  L{u,k)  =  0  for 
the  continuum  case: 


(^)^=fc^  (2-15) 

where  the  wave  speed,  c,  defined  as  c  =  \/Ejp  is  constant. 

The  characteristic  equation,  Eq.  (2.15),  indicates  that  for 
the  continuum  solution,  the  wave  number,  k,  is  directly 
proportional  to  the  frequency,  w,  i.e.,  k  =  uife.  With  c 
constant,  each  Fourier  component  of  a  wave  group  will 
propagate  without  dispersion  with  the  same  phase  velocity. 

To  examine  the  effect  of  kinetic  energy  discretizations  on 
the  accuracy  of  vibration  analysis,  we  propose  the  following 
simple  modification  for  our  proposed  mass  matrix: 

=  (l  Of)  •  IVlfttmp  "h  0£  *  Klcon  (2  *  16) 

where  a  is  a  constant  to  be  determined.  Note  that  0  =  0 
corresponds  to  the  case  of  lumped  mass  matrix  uid  a  ==  1 
to  consistent  mass  matrix,  respectively.  Figure  1  illustrates 
frequency  vs  wave  number  for  the  continuum  bar,  the  dis¬ 
crete  bar  with  the  consistent  mass  matrix  (a  —  l)  and 
Vfith  the  lumped  mass  (a  =  0),  and  for  an  averaged  mass 
matrix(a  =  0.5).  Although  not  shown  in  the  figure,  other 
Vv\lues  of  ot  can  be  selected  in  order  to  tailor  the  accuracy 
requirement  for  different  frequency  ranges.  For  example, 

a  =  (0.5, 0.5682r’ 59, 0.89207289)  gives  a  most  accurate 
result  for  small  k,  ft  around  x/2  and  k  around  v,  respec¬ 
tively.  Hence,  depending  upon  the  critical  accuracy  range 
of  interest,  one  can  adjust  the  mass  matrix  accordingly. 


0.0  0.5  1.0  1.5  2.0  2.5  3.0  o.S 

NORMAEIZED  WAVE  NUMBER  (*() 

Figure  1.  Dispersion  Curve  for  Linear  Bar 


3.  Mass-Lumr'  ^  of  a  Euler-Bernoulli  Beam  El¬ 
ement 


In  the  preceding  section  we  have  succintly  demon¬ 
strated  that  a  discrete  Fourier  analysis  can  provide  numer¬ 
ical  and  physical  insight  into  mass  lumping  as  well  as  can 
lead  to  improved  mass  matrix  approximation.  In  this  sec¬ 
tion,  we  will  demonstrate  that  the  proposed  mass  lumping 
procedure  is  also  applicable  for  cases  that  require  disc’‘»te 
Fourier  matrix  operators. 

The  homogeneous  differential  equatiohs  of  motion  for 
the  Euler-Bernoulli  beam  can  be  expressed  as 


(3.1) 


where  A  is  the  cross-section  area  of  the  beam,  I  is  the 
moment  of  inertia.  With  a  general  harmonic  wave  solution 
of  (3.1)  of  the  form 


w  = 


we  obtain  the  Fourier  operator  for  the  beam  as 
Lii^^  ^)  “  IV  .  Ndesact  4*  Kbeam 

where 


^*46*0^  —  P-^)  ^heam  —  Elk 


(3-2) 

(3-3) 

(3-4) 


A  cubic  interpolation  of  w  gives  for  each  beam  element  of 
length  I  the  following  consistent  mass  and  stiffness  matri¬ 
ces 
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420 


k; 
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61 

4f* 
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-12 
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12 

-6t 

. 

20 

-6t 

40  _ 

(3-5) 


(3-6) 


where  the  elemental  nodal  degrees  of  freedom  are  arranged 
as 

u*  =  lwi,0i,u;2,p2j^  (3-7) 

By  assembling  two  interior  beam  elements  and  designating 
their  nodes,  m  —  l,m  and  m  +  1,  respectively,  we  obtain 
the  following  difference  equations  for  the  mth  node: 


+  312u)m  +  —  ?i»+l)} 

+  ^{l2(-(u.„_,  +  2u).»_i  -  iu«+i)  +  ° 

'•  (5») 

+  Vm+l)  +  t’(-38«-i  +  iSm  -  3ii*+0} 

+^{6Z(iu„_  I  -  «.,+,)  +  2r*(#*_i  +  +  #»+i)}  =  0  (3  •  9) 
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Since  u)y  and  ffy  are  treated  independently,  we  seek  a  solu¬ 
tion 

to  Fourier-transform  the  coupled  difference  equation  (3.8) 
and  (3.9)  to  obtain 

(-a/®M:on  +  Kfc.<.,„)||^|=0  (3.11) 


where 


^con  —  pA.t 


KlKam  — 


in  which 


12F:/ 

43 


mil 

-»mi2 

— i4sin  kl 


imia 

itsmkL 


(3-12) 


(3-13) 


f  mu  =  (1  -  *12  =  -^4smA:4,  ,  . 


Mf.»,  = 
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420 
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-3«» 
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“  «o 
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0 

-3C* 

0 

4C2 

(3-19) 

Hence,  for  simple  elements  the  ad-hoc  d.o.f.-by-d.o.f.  row- 
summation  procedure  is  justified  in  view  of  the  present 
proposed  mass-lumping  procedure  (2.1).  It  should  be  men¬ 
tioned  that  care  must  be  exercised  in  lumping  matrices  for 
higher-order  elements. 

Now,  to  address  the  accuracy  associated  with  the  choice 
of  a  mass  matrbc  (either  Miump  or  Mcon>  or  even  their 
combinations).  Let  us  examine  the  characteristic  equation 
from  (3.11),  viz. 


+ia^ttiaie 


-  ta^/iin  W  -  -^i*) 

a  _  12S7 
j)Ai* 


1  =  0, 


(3  ■  20) 


which  yields  the  following  frequency  vs.  wave  number 
equation: 


The  lumped-mass  matrbc,  according  to  our  proposition 
(2.1),  for  the  mth  assembled  node  thus  becomes 


^lump  —  l^con  —  pA4 


JL  , 

210  J 


(3-15) 


which,  when  translated  into  the  element  mass  matrix,  is 
equivalent  to 


..  t 
2 


M 


lump 


=  pAi 


irLo 


420 


(3-16) 


Remark;  One  of  ad-hoc  mass-lumping  procedures  is  to 
sum  up  d.o.f.-by-d.o.f.  contribution.  When  this  ad-hoc 
technique  is  applied  to  the  element  mass  matrbc  (3.5),  one 
obtains: 


Mfu„p  -  M,”  ,„p  +  Mf„„p 


(3-17) 


where  for  the  tu-d.o.f.s  we  have 


M 


For  the  d-d.o.f.s 
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(3  18) 


{(mu»»a2  -  ’"la)'*’*  “  i»*(triui#  +  2im3  jk4  •  »nia  +  ritaaPt*)!!)* 

+‘’*13]^  =  ° 

(3-21) 

where  =  1  - 

Comparing  (3.21)  with  its  differential  equation  counterpart 
(3.3),  one  concludes  that,  since  mu  is  the  translation2d 
part,  we  must  have 


This  =  0  .■'’’d  rh22  =  0 
if  (3.21)  is  to  have  the  following  form: 

W  Nllf,p,p  +  K&eam  —  9 


(3  •  22) 


(3  •  23) 


Hence,  we  conclude  that  for  the  cubic  Euler-Bemoulli 
beam  element  to  be  consistent  with  the  original  differential 
equation,  one  must  employ  the  following  mass  matrix  (i.e., 
mi3  =  0,  mas  =  0): 


M* 


pAt 

420 


210-54(2  0  54a  0 

0  0  0  0 

54a  0  210 -54a  0 

0  0  0  0 


0<a<  1 
(3.24) 


Application  of  (3.24)  yields  the  following  characteristic 
equation: 

+  =  0  (3.25) 

so  that  we  have  from  (3.3)  and  (3.25)  the  following  fre¬ 
quency  equations 
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Figure  2.  Dispersion  Curve  for  Beam  with  Rotatory  In¬ 
ertia 


For  continuum  case: 


For  the  consistent  matrix  (3.23); 


(3  •  26) 


(3  •  27) 


Figure  2  shows  the  frequency  vs.  the  normalized  wave 
number  for  the  mass  matrix  (3.24),  that  is,  neglecting  the 
rotational  contribution  to  the  element  mass  matrix  (see 
Eq.  (3.25)).  For  this  case,  the  larger  the  value  of  a,  the 
more  accurate  the  frequency  curve  becomes. 

Of  course,  as  shown  by  Archer  (1963),  the  use  of  the  con¬ 
sistent  mass  matrix  (3.5)  may  improve  the  frequency  ac¬ 
curacy  over  the  lumped  mass  matrbc  (3.16).  In  order  to 
gain  insight  into  the  role  of  various  mass  modeling  on  the 
accuracy  of  vibration  frequencies  and  mode  shapes,  we  ap¬ 
ply  our  proposed  mass  matrix  formula  (2.16)  by  combining 
(3.5)  and  (3.16)  to  obtain: 

‘210-540  2250  S4o  -I3«a 

_  Mf  2250  5*(1  +  3o)  1350  -35*0 

~  420  540  135a  210  -  S4o  -225a 

L  -135a  -35*a  -225o  5*(l  +  3o) 

(3-28) 

Figure  3  illustrate  the  effect  of  the  mass-matrix  averag¬ 
ing  based  on  the  above  averaged  mass  matrix  (3.28).  Ap¬ 
plication  of  the  above  mass  matrix  yields  a  characterstic 
equation  that  is  similar  to  (3.21)  except  m.-y  are  modified 
accordingly.  As  was  the  case  for  the  bar,  a  =  (0,1)  cor¬ 
responds  to  the  lumped  matrix(3.16)  and  the  consistent 
matrbc(3.5),  respectively.  It  is  observed  that  for  a  =  0.25 
the  discrete  frequency  vs  the  wave  number  curve  follows 
almost  on  top  of  the  continuum  case. 


Figure  3.  Dispersion  Curve  for  Beam  without  Rotatory 
Inertia 

In  order  to  assess  the  preceding  a  priori  determination  of 
improved  mass  matrix  based  soley  on  the  symbolic  analy¬ 
sis,  we  have  performed  the  vibration  analysis  of  a  free-free 
beam  modeled  by  two  elements  and  compared  the  present 
results  with  the  one  performed  by  Archer(1963).  The  re¬ 
sults  are  summarized  in  Table  1.  It  is  observed  that  the 
conventional  consistent  mass  matrix  (a  =  l)  gives  an  error 
of  about  1%  for  the  first  mode,  13%  for  the  second  mode, 
45%  for  the  third  mode  and  40%  for  the  fourth  mode. 
In  contrast,  for  an  averaged  mass  matrix  (a  =  0.5),  the 
corresponding  errors  are  27%  for  the  first  mode,  2.5%  for 
the  second  mode,  1.8%  for  the  third  mode  and  1.7%  for 
the  fourth  mode,  respectively.  It  is  noted  that  the  sym¬ 
bolic  analysis  results  predict  a  —  0.5  to  be  most  accurate 
while  the  finite  element  solutions  indicate  that  a  =  0.25 
to  be  most  accurate.  We  conjecture  that  this  discrepancy 
is  due  to  the  fact  that  the  finite  element  solutions  are  for 
finite  beams  whereas  the  Fourier  analysis  assumes  an  infi¬ 
nite  beam.  Neverthless,  the  accuracy  prediction  based  on 
the  discrete  Fourier  analysis  as  given  in  Figure  3  provides 
a  qualitative  measure  of  different  mass  modeling  choices. 


SOLUTION  TYPE 

W| 

Ui, 

'jJi 

EXACT 

5.5944 

15.481 

30.268 

49.945 

o  =  0.0 

3.4273 

38.983 

41.428 

50.438 

a  =  .25 

3.7075 

19.101 

32.911 

50.304 

II 

o 

4.0934 

18.377 

30.821 

50.794 

a  =  .75 

4  6«{0 

15.867 

32.474 

53.399 

a  =  1.0 

5.6058 

17.544 

43.370 

70.087 

Table  1.  Frequencies  Computed  by  Different  Mass  Mod¬ 
els  for  Beam 

Incidently,  the  large  error  of  the  averaged  mass  matrix  for 
the  first  mode  b  not  too  much  of  concern  because  as  the 
number  of  elements  are  increased,  the  first  mode  quickly 
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converges.  In  fact,  when  five  elements  are  assembled,  the 
error  for  the  first  mode  reduced  to  less  than  0.2%  with 
a  =  0.5  while  maintaining  high-accuracy  for  the  higher 
modes.  On  the  other  hand,  the  conventional  mass  matrix 
( a  =  1)  continues  to  give  large  errors  for  the  higher  modes. 


4.  Mass  Matrices  for  Plate  Vibration  Analysis 

So  far  we  have  shown  that  the  proposed  mass-lumping 
procedure  (2.1)  and  the  improved  consistent  mass  matrbc 
(2.16)  lead  to  substantial  improvements  in  beam  vibra¬ 
tion  analyses.  In  particular,  depending  upon  one’s  desire 
for  tailoring  the  analysis  accuracy  for  certain  frequency 
ranges,  one  can  modify  a  priori  the  mass  matrix  as  pro¬ 
posed  by  (2.16)  and  manifested  iii  Figs.  1-3.  For  La¬ 
grange  family  of  plate  and  shell  elements,  the  proposed 
lumping  procedure  (2.1)  is  equally  applicable.  In  this  sec¬ 
tion  we  will  first  examine  plate  vibrations  based  on  the 
Reissner-Mindlin  plate  theory.  We  will  then  employ  a  dis¬ 
crete  counterpart  of  the  continuum  theory  when  the  plate 
is  approximated  by  a  four-noded  plate  bending  element. 
We  will  then  compare  the  continuum  and  symbolically  gen¬ 
erated  discrete  equations  in  the  Fourier  domain  in  order  to 
gain  insight  into  the  effect  of  mass  matrices  on  the  accuracy 
of  vibration  frequencies.  We  have  found  that  our  symbolic 
analysis  of  the  discrete  plate  equations  obtained  from  the 
finite  element  plate  bending  equations  provides  a  qualita¬ 
tive  measure  of  solution  accuracy.  Numerical  experiments 
corroborate  symbolic  analysis  results;  for  the  case  of  the 
four-noded  plate  bending  element,  the  mass  matrix  that 
yields  a  best  accuracy  is  determined  to  be 

=  0.2SM,„mp  -1-  0.75  •  M  con 

We  now  present  a  detailed  analysis  and  some  numerical 
results. 


The  Reissner-Mindlin  plate  equations  can  be  expressed  in 
differential  matrix  form  as  (Park  and  Flaggs,  1985): 


where 


L(i,y,0-u  =  f 

=  f^  =  l0,0,gj 

-^12^ 

~D. 


L(z,y,0  = 


sym. 


in  which  is  defined  as 


D-f, 


all 


-I 


(4-2) 

(4-3) 


D  ~ 
ax 


D  — 

dy 


-D.V^ 

I 


(4-4) 


and 


V 


dx^  ay2 


„  Eh?  „  l-t'r. 

^“I2(l-y2)’  2 


1  4*  i/ 

Di2  =  ^D,  D.= 


KEh 
2(1  +  v) 


(4.5) 


(4.6) 


in  which  9,  and  w  are  the  rotations  and  the  transverse 
displacement;  q  is  the  transverse  load  per  rmit  area;  E,  h,  u 
and  K  are  Young’s  modulus,  plate  thickness,  Poisson’s  ratio 
and  the  shear  correction  factor;  /  and  m  are  the  rotatory 
inertia  and  the  plate  mass  with  x,y  and  t  denoting  the 
Cartesian  coordinates  and  time,  respectively. 

The  Fourier-transformed  matrbc  operator,  L{kx,ky,u)  , 
can  be  obtained  for  the  plate  equations  by  seeking  a  har¬ 
monic  solution  of  the  form 


u(a:,  y,  t)  =  u(zo,  yo,  to)  exp(y(wt  -  k^x  -  k^y)]  (4 . 7) 


where  w  is  the  circular  frequency  and  ^.y)  are  the  (z,  y)- 
directional  wave  numbers  with  j  —  y/^.  Substitution  of 
(4.7)  into  (4.4)  yields 


L(kx,ky,(v)  — 


-Dkl  ~DMy  -jD,kx‘\ 
-Dnkl 


•D, 

+/w2 


sym. 


-Dkl  -jD.ky 
-Dnkl 
-D. 

+/w* 

-D.V* 


(4.8) 


with  the  corresponding  imcoupled  continuum  Fourier  op¬ 
erator  given  by 


f 


=  [(Z>V®  +  /w2)(V2  +  ^iw^)  - 
L  D, 


(4.9) 


where  the  Fourier-transformed  V*  operator  is  defined 

V^  =  -[kl^kl)  (4.10) 

The  Fourier-transformed  continuum  matrix  operator, 

L{kz,ky,u),  given  by  (4.8)  and  the  uncoupled  continuum 
-c 

operator,  I  ,  by  (4.9)  will  serve  as  our  reference  equations 
with  which  the  corresponding  discrete  operators  and  char¬ 
acteristic  equations  will  be  compared  in  order  to  assess  the 
effect  of  mass  lumping  on  plate  vibrations. 

The  general  discrete  Fourier  operator  that  approximates 
the  continuum  case  by  the  four-noded  plate  bending  ele¬ 
ment  can  be  shown  to  be  (Park  and  Flaggs,  1985): 
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L^  = 


-dEJi,  -jD»J»V3E7ii 

-D.Xiir 
+/u*((l  -  a) 

+i.i,) 


-Dll*2x, 
+/(j*((l  -  a) 


tym. 


D.(EJi,  +  EJi.) 


(■•■11) 


where  the  discrete  fourier  numbers,  and  k^,  and  the 
so-called  directional  averaging  operators,  l{x,y)  and  X(*.v). 
are  given  by 


'  kx  =  3in(fc,l*/2)/l,/2 
-  1  (4-12) 

and  a  is  a  coefficient  defined  in  the  mass  matrix  modeling 
formula  (2.16). 

The  relation  between  the  frequency  vs.  the  wave  number 
for  the  continuum  and  the  discrete  cases  can  be  obtauned 
by  requiring  the  determinant  of  (4.8)  and  (4.11)  to  be  zero. 
Figure  4  shows  the  normalized  frequency  {wf?\/pjD  )  vs. 
the  wave  number  {kt)  for  the  continuum  and  the  discrete 
cases  with  various  mass  matrix  choices  based  on  (2.16).  It 
is  noted  that  the  two  cases  correspond  to  an  infinite  plate 
or  so-called  interior  solutions.  Judging  from  Fig.  4,  one 
may  conclude  that  the  choice  of  a  =  0.5  (the  average  of 
the  lumped  and  the  consistent  matrices)  should  perform 
beat  for  plate  vibration  analysis.  For  plates  with  finite 
dimensions  as  we  shall  see,  the  influence  of  boundary  edges 
must  be  taken  into  consideration  in  utilizing  the  dispersion 
curve  for  selecting  an  appropriate  choice  of  mass  matrix. 


Table  2  summarizes  the  vibration  analysis  results  with  dif¬ 
ferent  mass  matrices  for  a  plate  with  free  edges.  In  com¬ 
puting  the  errors  in  frequency  computations  by  the  four- 
noded  element,  we  have  assumed  that  the  frequencies  given 
in  (Leissa,  1969)  to  be  the  converged  answers.  The  finite  el¬ 
ement  anedysis  results  indicate  that  the  discrete  frequency 
vs.  wave  number  curve  shown  in  Fig.  4  overestimates  the 
frequency  error  from  the  above.  We  conjecture  that  this 
overestimation  manifested  in  the  present  discrete  Fourier 
analysis  may  have  been  due  to  the  failure  of  the  four-node 
element  to  satisfy  rigorously  the  free-edge  conditions. 

Table  2.  Frequency  Errors  Computed  by  Different  Mass 
Models  for  Free-Free  Plate 


MASS  TYPE 

MODE 

4x4  Mesh 

10x10  Meah 

38x36  Mtsh 

1 

•30.1 

•8.S 

-3.9 

Lumped 

z 

-3Z.8 

-15.7 

-8.4 

Mtm 

3 

•2S.S 

•11.2 

-5.7 

a  =  0.0 

4 

1S.4 

•20.0 

5 

- 

- 

•20.0 

1 

-34.7 

-0.2 

-Z.7 

2 

-20.4 

-11.5 

-0.0 

a  =  .2S 

3 

-17.7 

-7.0 

-3-t 

4 

- 

14.7 

-7.Z 

s 

- 

- 

-17.0 

I 

-17.7 

-3.0 

-1.3 

2 

•17.7 

.6.8 

-3.4 

a=.S0 

3 

•0.0 

-2.1 

•0.9 

4 

- 

9.1 

-4.2 

S 

- 

-12.9 

1 

-3.5 

-0.7 

•0.2 

2 

-S.l 

•1.2 

-0.7 

a=.7S 

3 

0.1 

3,7 

1.8 

4 

• 

Z.2 

-0.8 

5 

• 

- 

8.Z 

4.8 

2.4 

1.1 

Couiittst 

Z 

16,1 

5.8 

Z.5 

Mu> 

3 

Z9.a 

10.8 

4.8 

a=  1.0 

4 

- 

8.5 

Z.S 

$ 

- 

- 

Z.5 

The  mode-by-'i-.'jde  convergence  characteristics  for  the  first 
three  modes  are  shown  in  Figs.  5-7.  With  the  exception 
of  the  first  mode  with  a  (2  x  2)-mesh,  it  is  observed  that 
the  choice  of  a  =  3/4  consistently  yields  the  most  accurate 
results. 


NUMBER  OF  ELEMENTS  ALONG  SIDE 


0.0  ' — - ' - = - ' - i- 

0-0  0.5  1.0  l.S  2.0 


NORMALIZED  WAVE  NUMBER  (*1) 


Figure  4.  Dispersion  Curve  for  Plate  (i/  =  0.343) 


Figure  5.  Error  in  First  Mode  for  Different  Mass  Model¬ 
ing 
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NUMBER  OF  ELEMENTS  ALONG  SIDE 

Figure  6.  Error  in  Second  Mode  for  Different  Mass  Mod¬ 
eling 


0  2  4  6 
NUMBER  OF  ELEMENTS  ALONG  SIDE 

Figure  7.  Error  in  Third  Mode  for  Different  Mass  Mod¬ 
eling 


5.  Discussions 

In  the  present  paper  a  systematic  technique  of  obtain¬ 
ing  the  lumped  mass  matrices  for  vibration  and  transient 
analyses  has  been  proposed.  The  technique  is  based  on 
the  symbolic  discrete  Fourier  analysis  and  reduces  to  the 
known  mass  lumping  techniques  for  simple  elements. 

In  order  to  further  increase  the  accuracy  of  frequency  com¬ 
putations,  an  improved  form  of  mass  matrix  is  proposed, 
which  is  a  combination  of  the  lumped  and  consistent  mass 
matrices.  Numerical  results  conducted  for  a  plate  with 
free  edges  indicate  that  substantial  improvements  in  in¬ 
termediate  frequency  computations  can  be  realized  if  one 
judiciously  employs  a  combined  mass  matrix. 

So  far  the  present  work  has  focused  on  modifying  the  mass 
matrices  in  order  to  improve  the  frequency  accuracy.  Our 
future  work  will  focus  on  the  combined  tailoring  of  both 


the  mass  and  stiffness  matrices  to  capture  inermediate  fre¬ 
quency  components  more  accur.ately.  It  should  be  noted 
that  the  accuracy  of  intermediate  frequency  components 
based  on  the  finite  element  methods  is  increasingly  impor¬ 
tant  in  the  areas  of  control  of  flexible  structures,  underwa¬ 
ter  acoustics  and  wave  propagation  through  composites 
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SUMMARY 

This  paper  reports  on  our  exp.  icnce  in  solving  large-scale  finite  element  transient  problems  on  the 
Connection  Machine.  We  begin  with  an  overview  of  this  massively  parallel  processor  and  emphasize  the 
features  which  are  most  relevant  to  finite  element  computations.  These  include  virtual  processors,  parallel 
disk  I/O  and  parallel  scientific  visualization  capabilities.  We  introduce  a  distributed  data  structure  and 
discuss  a  strategy  for  mapping  thousands  of  processors  onto  a  discretized  structure.  The  combination  of  the 
parallel  data  structure  with  the  virtual  processor  mapping  algorithm  is  shown  to  play  a  pivotal  role  in 
efficiently  achieving  massively  parallel  explicit  computations  on  irregular  and  hybrid  two-  and 
three-dimensional  finite  element  meshes.  The  finite  element  kernels  written  in  C*.  Paris  have  run  with 
success  to  solve  several  examples  of  linear  and  non-linear  dynamic  simulations  of  large  problem  sizes.  From 
these  example  runs,  we  have  been  able  to  assess  in  detail  their  performance  on  the  Connection  Machine.  We 
show  that  mesh  irregularities  induce  an  MIMD  (Multiple  Instruction  Multiple  Data)  style  of  programming 
which  impacts  negatively  the  performance  of  this  SIMD  (Single  Instruction  Multiple  Data)  machine, 
r inally,  we  address  some  important  theoretical  and  implementational  issues  that  will  materially  advance  the 
application  ranges  of  finite  element  computations  on  this  highly  parallel  processor. 


1.  INTRODUCTION 

Parallel  computers  are  having  a  profound  impact  on  computational  mechanics.  This  is  reflected 
by  the  continuously  increasing  number  of  publications  on  finite  elements  and  parallel  processing. 
Not  only  have  some  computational  strategies  been  re-designed  for  implementation  on  commer¬ 
cially  available  multiprocessors,  but  also  some  innovative  algorithms  have  been  spurred  by  the 
advent  of  these  new  machines.  However,  many  of  the  reported  parallel  finite  element  simulations 
have  been  on  systems  with  a  few  processors.  Examples  of  these  systems  are  Intel’s  iPSC  with  32 
processors  (reported  by  Farhat  and  Wilson*),  JPL, Caltech’s  hypercube  with  32  processors,’ 
Alliant’s  FX8  model  with  8  processors'*' and  CRAY’S  systems  with  up  to  4  processors.*  (For 
more  complete  lists  of  references  on  this  topic  see  White  and  Abel^  and  Noor.’  While  great 
speed-ups  were  measured  on  these  coarse  to  medium  grain  machines,  Farhat®  has  shown  that 
traditional  vector  supercomputers  could  not  be  outperformed  in  finite  element  simulations 
(except  of  course  on  systems  which  connect  more  than  one  vector  superprocessor,  such  as  the 
CRAY  X-MP  and  CRAY-2  systems,  each  of  which  has  4). 

Recently,  massively  parallel  machines  have  demonstrated  their  potential  to  be  the  fastest 
supercomputers,  a  trend  that  may  accelerate  in  the  future.  While  solving  the  shallow  water 
equations,  McBryan  has  reported  that  the  Connection  Machine  (CM_2  in  the  s.quel)  (65  536 
processors)  was  three  times  faster  than  the  four-processor  CRAY  X-MP.”*  Gustafson  ct  al.  have 
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developed  highly  parallel  solutions  for  baffled  surface  wave  equations,  unstable  fluid  flow  and 
beam  strain  analysis,  and  have  reported  performances  on  NCUBE’s  1024-processor  hypercube 
which  are  close  to  those  of  vector  supercomputers.*® 

The  objective  of  the  present  study  has  been;  first,  to  evaluate  the  multiprocessing  features  of  the 
CM_2  that  are  relevant  to  finite  element  computations:  second,  to  develop  a  suitable  finite 
element  data  structure  which  exploits  the  system  architecture;  third,  to  implement  a  decomposi¬ 
tion/mapping  procedure  that  matches  as  far  as  possible  the  layout  of  the  processors  to  the  finite 
element  meshes:  and  fourth,  to  assess  those  implications  of  finite  element  analysis  on  the  CM_2 
that  should  be  considered  in  the  design  of  future  massively  parallel  processors.  Hence,  we  focus 
primarily  on  implementational  issues  that  are  critical  for  the  full  exploration  of  the  multiprocess¬ 
ing  capabilities  of  the  CM_2,  and  only  secondarily  on  solution  algorithms,  as  far  as  they  impact 
the  present  study  on  implementational  issues. 

The  finite  element  equations  of  motion  for  structural  systems  can  be  expressed  as 

iMd -f  F'"(i  d)  =  F"  (0 

where  M  denotes  the  positive  definite  lumped  mass  matrix.  F'"  and  F“  denote  the  internal  and 
external  force  vectors,  and  d,  d  and  d  denote  respectively  the  acceleration,  velocity  and  displace¬ 
ment  vectors.  In  the  linear  case,  the  internal  force  vector  becomes 

P"  =  Dd  Kd  (2) 

where  D  and  K  are  the  damping  and  stiffness  matrices  respectively,  which  are  positive  semi 
definite.  In  this  work,  an  eventual  damping  is  assumed  to  be  proportional  to  the  mass  and 
stiffness. 

The  algorithmic  nature  of  a  candidate  solution  method  for  the  structural  dynamics  equation  (1) 
can  significantly  influence  the  software  requirements,  data  communications  and  arithmetic 
efficiency.  As  our  main  focus  is  on  implementational  issues  rather  than  algorithmic  ones,  we  have 
decided  on  a  simple  explicit  time  integration  procedure.  Hence,  we  choose  to  integrate  equation 
(1)  with  the  fixed  step  explicit  central  difference  algorithm  because  (a)  it  is  inherently  parallel,  and 
(b)  it  has  the  largest  undamped  stability  limit  among  second-order  accurate  explicit  linear 
multistep  algorithms,  as  has  been  demonstrated  by  Krieg.'*  In  our  context,  it  expressed  as 

jn+i,2  ^  ^  /,M-'(F"(t")  -  F'"(d",  d"))  (3) 

d"^'  =  d" -i- 

where  h  is  the  fixed  time  step  and  the  superscript  n  indicates  the  value  at  the  discrete  time  t". 

The  remainder  of  this  paper  deals  with  the  massively  parallel  solution  of  (1)  using  (3).  and  is 
organized  as  follows.  In  Section  2,  we  give  an  overview  of  the  CM_2  hardware  configuration  and 
emphasize  those  features  which  arc  pertinent  to  finite  element  computations.  In  particular,  we 
address  issues  that  are  related  to  the  processor  memory  size,  to  the  SIMD  architecture  and  to  the 
fast  interprocessor  communication  package,  the  .V£fFS  grid.  In  Section  3,  we  discuss  the  floating 
point  arithmetic  performance  of  the  CM_2  and  highlight  its  current  dependence  on  the  selected 
language  compiler  Algebraic  manipulations  coded  in  *Lisp  arc  shown  to  be  three  times  as  fast  as 
when  written  in  C*.  A  general  purpose  finite  element  distributed  data  structure  is  presented  in 
Section  4.  Designed  originally  to  handle  massively  parallel  finite  element  explicit  computations 
on  irregular  and  hybrid  meshes,  this  parallel  data  structure  is  also  very  efficient  for  parallel  1,0 
manipulations  and  parallel  graphic  animation.  Since  the  often-encountered  mesh  irregularities 
inhibit  the  use  of  the  .VEH'S  grid  communication  package,  we  discuss  in  Section  5  an  alternative 
decomposition,  mapping  strategy.  The  decomposition  technique  is  designed  to  minimize  both  the 
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amount  of  communication  between  different  chips  and  the  amount  of  wire  contention  within 
a  chip.  The  mapping  algorithm  attempts  to  reduce  the  distance  that  information  must  travel. 
Section  6  summarizes  the  overall  organization  of  the  massively  parallel  transient  simulation.  In 
Section  7,  our  parallel  data  structure  and  processor  mapping  are  applied  to  (3)  for  the  solution  of 
various  large-scale  transient  problems.  Measured  performances  are  analysed  in  detail.  Mesh 
irregularities  are  shown  to  be  the  source  of  several  factors  which  considerably  slow  down  the 
machine.  Finally,  in  Section  8,  we  address  some  important  theoretical  and  implementational 
issues  that  will  materially  advance  the  application  ranges  of  finite  element  computations  on  the 
CM_2.  In  particular,  we  note  that  time  integration  numerical  algorithms  such  as  explicit  finite 
differences  and  equation  solvers  such  as  the  preconditioned  conjugate  gradient  are  implemented 
using  the  same  parallel  data  structure  and  mapping  algorithm  which  are  presented  in  this  paper. 
We  compare  the  substructuring  technique  and  the  virtual  processor  approach,  and  comment  on 
the  implications  of  implicit  algorithms  for  the  effective  use  of  the  CM_2. 

2.  THE  CONNECTION  MACHINE  HARDWARE  ARCHITECTURE 

Here  we  present  an  overview  of  the  CM_2  system  organization  and  discuss  issues  that  are 
pertinent  to  massively  parallel  finite  element  computations.  See  Hillis*’  for  an  in-depth  discussion 
on  the  rationale  behind  the  CM_1  (a  previous  model  of  the  Connection  Machine),  the  Technical 
Summary  of  Thinking  Machines  Corporation*^  for  further  architectural  information  and 
McBryan^  for  initial  studies  of  scientific  computations  on  the  CM_1.  For  the  sake  of  clarity,  we 
summarize  the  architectural  features  before  discussing  their  impact  on  finite  element  simulations. 


2.1  System  organization 

2.1.1.  CM-2:  The  parallel  processing  unit.  The  CM_2  is  a  cube  1-5  m  on  a  side,  made  of  up  to 
eight  subcubes  (Plate  1).  Each  subcube  contains  512  chips  and  every  chip  includes  16  bit  serial 
processors  which  are  connected  by  a  switch.  Each  individual  processor  has  64  Kbits  (8  Kbytes)  of 
bit-addressable  local  memory  and  an  arithmetic-logic  unit  (ALU)  that  can  operate  on  vari¬ 
able-length  operands.  Every  two  chips  may  share  an  optional  Wcitek  floating  point  accelerator 
chip.  A  fully  configured  CM_2  thus  has  4096  (2*^)  chips,  2048  floating  point  accelerator  chips, 
65536  processors  and  512  Mbytes  of  memory.  The  chips  are  arranged  in  a  12-dimensional 
hypercube.  A  chip  i  is  directly  connected  to  12  other  chips  J,  with  the  binary  representation  of 
i  and  j  differing  only  by  1  bit.  The  CM_2  system  provides  two  forms  of  communication  between 
the  processors. 

•  A  general  mechanism  known  as  the  router  which  allows  any  processor  to  communicate  with 
any  other  processor.  Each  CM_2  chip  contains  one  router  node  i  which  serves  the  16 
processors  on  the  chip,  numbered  16/ to  16/  -t-  15.  The  router  nodes  on  all  the  chips  are  wired 
together  in  a  12-dimensional  Boolean  cube  and  together  form  the  complete  router  network 
(Figure  1).  For  example,  suppose  that  processor  117  (processor  5  on  router  node  7)  has 
a  message  to  send  to  processor  361  (processor  9  on  router  node  22).  Since  12  =  1  -r  1*  —  2", 
router  7  forwards  the  message  to  router  6  (6  =  7  -  2°)  which  forwards  it  to  router  22 
(6  -r  2'*).  which  delivers  the  message  to  processor  361. 

•  A  more  structured  and  somewhat  faster  communication  mechanism  called  the  NEWS  grid. 
Each  processor  is  wired  to  its  four  nearest  neighbours  in  a  two-dimensional  rectangular  grid 
(Figure  2).  Communication  on  the  jV£  WS  grid  is  extremely  fast  and  recommended  whenever 
it  is  possible. 
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Figure  1.  The  router  network 
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Figure  1  The  NEWS  grid 


An  important  practical  feature  of  the  CM_2  is  the  support  for  virtual  processors.  When  the 
CM_2  is  initialized  for  a  run,  the  number  of  virtual  processors  (vp  in  the  sequel)  may  be  specified. 
If  it  exceeds  the  number  of  available  physical  processors,  then  the  local  memory  of  each  processor 
is  split  up  into  a  number  of  regions  equal  to  the  ratio  between  the  number  of  vps  and  the  number 
of  physical  processors.  Automatically,  for  every  Paris  (PARallel  Instruction  Set)  instruction,  the 
processors  are  time-sliced  among  the  regions.  If  a  physical  processor  is  simulating  .V  vps,  each 
Paris  instruction  is  decoded  by  the  sequencer  (as  explained  below)  only  once  for  S  e.xecutions. 
This  results  in  an  enhanced  user  performance.  Also,  the  use  of  a  vp  >1  allows  the  pipelining  of 
floating  point  operations  in  the  Weitek  chips,  which  provides  an  additional  enhancement  to 
machine  performance.  The  system  organization  of  a  CM_2  is  shown  in  Figure  3. 

The  CM_2  is  an  SIMD  machine.  All  processors  must  execute  identical  instructions  or  some 
processors  may  choose  to  ignore  any  instruction.  Consequently,  an  instruction  which  involves 
a  nested  binary  branch  can  see  its  execution  time  increased  by  a  factor  of  two.  The  SIMD  nature 
of  the  CM_2  has  some  disadvantages  in  finite  element  computations,  as  will  be  shown. 

2.2.  Impact  on  finite  element  computations 

It  is  well  known  that  the  solution  algorithm  (3)  can  be  implemented  using  only  clement-level 
computations.  Hence,  if  each  vp  of  the  CM_2  is  mapped  onto  one  finite  clement,  equation  ( 1 )  can 
be  etficicntly  integrated  in  parallel.  The  rationale  behind  this  processor-to-elcmcnt  assignment 
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will  be  analysed  in  Sections  4  and  8.  Here,  we  discuss  the  direct  impact  of  the  CM_2  hardware  on 
such  a  decision. 

2.2.1.  The  local  memory  and  element  level  computations.  Consider  the  9-node  curved  shell 
element  shown  in  Figure  4,  Three  displacements  and  two  rotations  are  attributed  to  each  node, 
which  amounts  to  a  total  of  45  degrees  of  freedom  per  element.  Consequently,  the  symmetric  part 
of  the  elemental  stiffness  matri.x.  K"*.  contains  45*(45  -r  I);2  =  1035  words.  If  double  precision  is 
used,  the  storage  of  K"*  amounts  to  1035*64  =  66  240  bits,  which  c.xcccds  the  65  536  bits  that  arc 
available  on  a  single  CM_2  processor.  On  the  other  hand,  if  single  precision  is  used,  the  storage  of 
K'^’  requires  33  120  bits,  so  that  32416  bits  arc  left  for  the  storage  of  the  vectors  d'^’.  d"'.  the 
elemental  lumped  mass  vector  M'*’*  and  the  forces  F""'  and  However,  even  in  the  latter 
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Figure  5.  A  two-step  NEWS  nechanism  on  a  regular  mesh 


case,  only  a  vp  ratio  of  1  can  be  used.  This  limits  the  size  of  the  finite  element  mesh  to  the 
maximum  number  of  processors  available  on  the  CM_2  at  hand.  Also,  it  inhibits  further 
performance  enhancement,  as  outlined  in  Section  2.1. 

Fortunately,  in  our  case  the  above  storage  requirements  can  be  considerably  decreased.  The 
nature  of  explicit  computations  is  such  that  F‘"(d")  can  be  directly  computed  from  the  displace¬ 
ments  at  t"  and  the  stress-strain  constitutive  equation.  As  a  result,  the  solution  process  defined  in 
(3)  involves  only  rector  quantities  which  do  not  require  a  large  amount  of  storage,  so  that  vp 
ratios  between  1  and  4  are  possible.  However,  the  reader  s’'''uld  keep  in  mind  that  the  current 
local  memory  size  of  a  CM_2  processor  may  penalize  sophisticated  high  order  elements  and 
implicit  finite  element  algorithms  in  general.  This  restriction  is  not  encountered  on  other 
commercially  availaole  hypcrcubcs  such  as  iPSC,  MCUBE  ana  AMETEK  among  others. 

2.2.2.  The  NE’>'S  grid  and  finite  element  patches.  Consider  the  regular  finite  element  mesh 
shown  in  Figure  5.  Except  on  the  boundu'ies,  each  element  is  connected  in  the  same  pattern  to 
exactly  eight  other  elemerts.  Consequently,  during  the  explicit  time  integrafion  algorithm,  each 
processor  communic.T.es  with  its  neighbour,",  in  the  same  manner.  Interprocessor  communication 
can  be  performed  with  a  two-.;tep  iV£ IFS  mechanism  (Figure  5).  However,  the  beauty  of  the  finite 
element  method  resides  in  the  fact  that  it  solves  models  with  irregular  meshes.  Typically,  a  finite 


TRANSIENT  FINITE  ELEMENT  COMPUTATIONS 


33 


Figure  6.  Transition  zones 


element  mesh  consists  of  several  patches  which  are  connected  together  using  irregular  transition 
regions  (Figure  6).  For  these  often  encountered  cases,  the  NEWS  grid  becomes  impractical. 
Rather,  the  router  has  to  be  utilized.  In  Section  4.  we  describe  how  a  distnbuted  data  structure  can 
guide  the  router  during  this  process. 

2.2.3.  SIMD  hardware  vs.  MIMD  finite  element  computations.  Typical  finite  element  meshes 
comprise  more  than  one  type  of  element.  Consider  the  case  where  a  discretized  region  is  modelled 
with  shell  elements  that  are  stiffened  with  beam  elements.  Clearly,  the  instructions  associated  with 
the  shell  elements  differ  from  those  associated  with  the  beam  elements.  Consequently,  the  vps 
which  are  assigned  to  shell  elements  and  the  vps  which  are  assigned  to  beam  elements  cannot 
execute  their  segments  of  code  in  parallel;  for  example,  the  be'-m  processors  have  to  execute  first, 
then  the  shell  processors.  If  Tf,  and  T,  denote  the  execution  times  associated  with  the  instructions 
for  P  beam  and  a  shell  element  respectively,  the  total  elapsed  parallel  time  for  a  single  instruction 
o\  er  the  set  (beams  +  shells)  on  an  SIMD  multiprocessor  is  +  T^.  On  an  MIMD  multiproces¬ 
sor,  this  elapsed  parallel  time  is  maxIT^  +  TJ.  Similar  situations  arise  when  during  the  loading 
some  elements  turn  to  be  matv-rially  non-linear  and  some  remain  linear.  In  this  case,  one  should 
always  compute  the  linear  component  of  the  response  (the  elastic  stiffness  for  example)  before 
attempting  to  test  the  yielding  criterion.  However,  in  spite  of  these  disadvantages  SIMD 
programs  can  still  be  attractive,  because  they  tend  to  be  easier  to  debug  and  rarely  suffer  from  the 
synchronization  errors  which  are  typical  of  MIMD  codes. 

2.2.4.  Parallel  I/O  in  finite  element  computations.  At  each  time  step,  the  computed  displace¬ 
ments,  velocities,  accelerations  as  well  as  strains  and  stresses  need  to  be  stored  on  disks.  This 
represents  a  significant  amount  of  1,0  traffic.  It  has  been  our  experience  that  the  CM_2  Data 
Vault  system  is  efficient  at  reducing  the  corresponding  elapsed  time  (see  Section  7). 

2.2.5.  Real-time  graphics  animations.  The  massively  parallel  real-time  animation  of  the  mesh 
deformations  is  a  direct  consequence  of  the  availability  of  the  Frame  Buffer  and  decision  of 
assigning  a  vp  to  a  finite  element.  At  each  time  step,  after  the  node  displacements  are  found  all  of 
the  vps  concurrently  draw  the  outline  of  their  assigned  elements  on  the  graphic  screen.  The  result 
is  a  real-time  finite  element  animation. 
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3.  BENCHMARKING  THE  CM-2 

At  the  time  of  writing  this  paper,  the  CM_2  supports  three  high  level  languages:  C*  (pronounced 
see-star),  ‘Lisp  (pronounced  star-lisp)  and  CM-Lisp  (pronounced  see-m-lisp).  The  first  two  are 
extensions  of  C  and  Lisp  respectively.  Paris  is  somewhat  the  assembly  language  of  this  parallel 
processor. 

In  this  section  we  comment  on  the  results  of  a  set  of  timing  experiments  that  were  carried  out 
on  the  CM_2  of  the  Center  for  Applied  Parallel  Processing  (CAPP),  at  the  University  of 
Colorado.  Boulder.  Since  only  one  eighth  of  a  cube  was  available  on  this  system,  all  results  were 
obtained  using  8192  processors.  McBryan^  has  shown  that  all  results  demonstrated  on  subcubes 
of  the  CM_2  scale  essentially  linearly  to  the  65  536  processor  system.  Consequently,  throughout 
this  paper,  megaflop  rates  are  reported  after  they  are  linearly  scaled  to  the  full  configuration. 
These  experiments  provided  us  with: 

•  a  reference  performance  for  the  evaluation  of  our  approach  to  massively  parallel  finite 
element  explicit  computations 

•  the  influence  of  the  vp  ratio  and  that  of  the  high  level  language  compiler  on  attainable 
performances.  At  this  point,  we  remind  the  reader  that,  if  an  application  requires  an  amo^g 
of  local  memory  (per  processor)  the  highest  vp  ratio  possible  is  equal  to  the  closest 

of  two  to  the  ratio  between  the  maximum  amount  of  local  memory  available  on  the 
(currently  8  Kbytes),  and  m^. 

Table  I  reports  the  megaflop  rates  for  some  scientific  computations  on  the  CM_2  at  different  vp 
ratios.  All  statements  were  written  in  C*.  Each  statement  is  performed  by  each  processor  on  its 
variables.  All  variables  were  declared  parallel  (local)  and  float  (simple  precision),  except  variable 
dp  which  was  declared  mono  (serial)  float,  and  variable  i  which  was  declared  mono  integer. 
Timings  were  measured  using  the  cmtimer  routines.  Each  ‘  ’  operation  or  operation  was 

counted  as  one  flop. 

Based  on  these  results,  we  have  observed  the  following: 

1.  Floating  point  performance  is  enhanced  at  higher  vp  ratios.  This  is  due  to  the  fact  that  for  vp 
ratios  greater  than  one,  computations  in  the  Weitek  chip  are  pipelined. 

2.  Vector  saxpys  are  not  slower  than  scalar  ones.  This  is  because  memory  addresses  are 
computed  on  the  front  end.  The  additional  speed  noticed  for  vector  saxpys  is  thought  to  be 
due  to  the  overlapping  of  addressing  and  floating  point  computations. 

3.  C*  appears  to  handle  poly  (parallel)  assignments  poorly.  This  can  be  seen  by  comparing  the 
performances  of  the  dot  product  and  the  vector  multiply.  Each  of  these  two  vector 


Table  I.  Megaflop  rates  using  C* 
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operations  requires  one  floating  point  per  processor.  In  addition,  the  dot  product  requires 
a  reduction  (accumulation  phase)  which  necessitates  communication.  However,  at  high  vp 
ratios,  the  dot  product  is  twice  as  fast  as  the  vector  multiply!  (At  low  vp  ratios,  the  amount  of 
floating  point  computations  is  not  large  enough  to  amortize  the  price  of  communication.) 
Since  the  dot  product  does  not  store  any  value  in  the  processor  memory  and  the  vector 
multiply  stores  the  result  of  .x  *  y  back  into  this  leads  us  to  believe  that  the  C*  compiler 
generates  a  code  which  is  very  inefficient  at  handling  assignments.  This  also  explains  why  the 
saxpy  exhibits  a  higher  megaflop  rate  than  the  vector  multiply:  it  has  twice  as  many  floating 
point  computations  for  oiie  assignment. 

The  same  computations  were  repeated  using  *Lisp.  The  comparison  of  both  sets  of  timings  for 
the  maximum  vp  demonstrates  a  formidable  superiority  of  the  *Lisp  compiler  (see  Figure  7).  This 
is  partly  due  to  the  fact  that  it  has  been  used  longer  on  the  CM_2  than  C*.  In  spite  of  the  proven 
superior  efficiency  of  *Lisp  over  C*,  we  have  chosen  to  implement  our  finite  element  code  using 
C*  because  of  our  familiarity  with  C. 


4.  FINITE  ELEMENT  PARALLEL  DATA  STRUCTURES 
Consider  again  the  explicit  central  difference  algorithm: 

jn+u:  ^  jn-i/2  ^  /,ivi->(F«(t")  _  F'"(d'',  d")) 

d"^‘  =d"  +  /id'’'"‘'- 


(4) 


The  global  mass  matrix  M  is  assembled  once.  At  each  time  step  t",  the  computations  are 
dominated  by  the  evaluation  of  the  internal  forces: 


F'"  = 


e=  1 


« 

[z.s]‘^(Tda 

fi"’ 


where  <7  is  the  stress  vector,  S  are  the  shape  functions,  Z,  is  a  partial  derivative  operator  and  £2"’  is 
the  area  of  the  eth  finite  element.  Clearly,  the  parallel  computation  of  F'"  is  best  done  el- 
ement-by-element.  Thus,  equation  (1)  can  be  efficiently  integrated  in  parallel  if  the  CM_2  virtual 
processors  are  mapped  onto  the  elements  of  the  mesh.  This  is  a  departure  from  the  grid  point 
massively  parallel  computations  advocated  by  Thinking  Machines  Corporation  for  the  CM_2.‘^ 
First,  all  processors  compute  concurrently  the  local  forces  F“"*(£'’)  and  F'"“*(d",  d").  Next,  these 
contributions  are  accumulated  through  communications  among  processors  that  are  mapped 
onto  neighbouring  elements. 

In  this  section,  we  describe  the  finite  element  data  structures  which  we  have  selected  to  drive  the 
massively  parallel  computations  on  the  CM-2.  These  are  element  oriented,  while  similar  data 
structures  proposed  for  other  hypercubes  are  subdomain  oriented  (see  Farhat  ei  al.^*  and  Fox 
et  al.^%  In  Section  8,  we  give  further  comments  on  this  difference.  We  group  these  data  structures 
into  two  sets. 

The  first  set  of  data  structures  deals  with  element-level  parallel  computations.  To  be  able  to 
perform  locally  its  assigned  element-level  computations — that  is.  to  perform  these  computations 
without  interacting  with  the  front-end  machine  —each  processor  must  store  in  its  own  memory  its 
element  type  (truss,  beam,  shell,  ....  number  of  Gauss  points,  . .  . ),  its  element  material 
properties  (density,  parameters  and  coefficients  for  constitutive  equations,  damping  characteristi¬ 
cs,  thickness, . . .),  its  nodal  geometry  (nodal  co-ordinates,  number  of  nodes  per  element)  and  its 
boundary  conditions  (fixed,  free  degrees  of  freedom  at  each  node,  prescribed  forces  at  each  node). 
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Figure  7.  A  Comparison  of  *Lisp  and  C*  performances 
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This  information  is  compacted  in  one-dimensional  arrays.  In  addition,  each  processor  must  also 
store  in  its  memory  a  set  of  scalars  corresponding  to  computational  parameters  such  as  the  fixed 
time  step  h,  and  a  scalar  or  one-dimensional  buffer  for  the  temporary  storage  of  messages  to  be 
passed  to  neighbouring  processors. 

The  second  set  of  data  structures  provides  the  router  with  the  mechanism  for  parallel 
interprocessor  communication.  The  inability  of  the  NEWS  grid  to  handle  irregular  communica¬ 
tion  patterns  has  been  addressed  in  Section  2.2.  Let  p  denote  a  virtual  processor  and  its 
assigned  finite  element.  In  order  to  exchange  F'"‘*’(d")  and  virtual  processor  p  must  be 

able  to  identify  at  run  time: 

•  the  set  of  processors  mapped  onto  elements  adjacent  to  e^, 

•  the  nodes  that  shares  with  these  elements 

•  at  each  shared  node,  the  degrees  of  freedom  which  need  to  be  assembled. 

This  particular  information  is  vital  for  meshes  with  different  types  of  elements.  It  guarantees  that, 
for  example,  a  moment  is  not  accumulated  with  a  force,  or  that  a  force  in  the  x  direction  is  not 
accumulated  with  a  force  in  the  y  direction. 

If  the  above  information  is  gathered  in  a  global  form  on  the  front-end  machine,  most  of  the 
execution  time  which  elapses  during  the  accumulation  phase  would  be  due  to  message-passing 
between  the  CM_2  processors  and  the  front-end  computer.  On  the  other  hand,  if  this  information 
is  decentralized — that  is,  if  the  memory  of  processor  p  is  loaded  only  with  the  subset  of  that 
information  which  is  relevant  to  the  connectivity  of  Cp — the  accumulation  phase  can  be  performed 
without  any  message-passing  between  the  CM_2  and  the  front-end  computer.  Consequently, 
prior  to  any  computation,  the  memory  of  processor  p  is  loaded  with  the  following  one-dimen¬ 
sional  arrays: 

Proc-att-to-node  For  each  node  connected  to  e^,  it  contains  the  identification  of  the  processors 
that  are  mapped  onto  elements  which  are  also  connected  to  this  node.  These 
are  stored  in  a  stacked  fashion. 

Pointer  This  is  a  pointer  array.  It  stores  in  position  i  the  location  in  Proc-att-to-iwde 

of  the  list  of  vps  that  are  attached  to  the  node  in  the  ith  local  position. 
Location  For  each  entry  in  Proc-att-to-node,  this  array  specifies  the  local  position  of 

the  shared  node  in  the  processor  that  is  mapped  onto  an  element  adjacent  to  e^ 

The  above  arrays  are  set  up  by  the  dedicated  finite  element  mesh  analyser  which  was  presented 
by  Farhat  et  al}*  They  require  about  80  integer  words  per  processor.  Clearly,  this  is  a  very  small 
overhead.  The  mechanism  of  these  arrays  is  depicted  in  Figure  8  for  element  1.  The  mesh  patch  is 
composed  of  shell  and  beam  elements. 

There  is,  however,  one  penalty  associated  with  assigning  one  element  to  each  vp.  The  nodes 
which  are  common  to  several  elements  are  duplicated  in  their  corresponding  processors.  As 
a  result,  about  1 1  per  cent  of  the  total  memory  available  on  the  CM_2  is  wasted.  This  is  a  small 
price  for  the  highly  parallel  computations  that  are  achieved.  Given  the  low  cost  of  memory 
nowadays,  this  seems  a  worthwhile  trade  off.  Moreover,  this  assignment  allows  LO  manipula¬ 
tions  and  graphic  post-processing  to  be  trivially  parallelized.  At  each  time  step,  after  the  nodal 
displacements  are  found,  all  of  the  processors  draw  concurrently  the  outline  of  their  assigned 
elements  on  the  Frame  Duffer  and  send  back  the  results  to  the  front  end  in  parallel. 

5.  THE  DECOMPOSITION/MAP  NG  STRATEGY 

Since  the  mesh  irregularities  inhibit  the  exploitation  of  the  NEWS  grid,  we  rely  on  the  data 
structures  of  Section  4  to  guide  the  router  during  interprocessor  communication.  However,  there 
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Element  I 

Proc-att-to-node  [2,  3,  3. 2] 

Pointer  [I.  2,  2,  3,  S] 

Location  [!•  ~] 

Figure  8.  A  distributed  data  structure  for  interprocessor  communication 


is  Still  one  additional  problem  to  resolve.  Efficiency  in  massively  parallel  computations  requires 
the  minimization  of  both  the  distance  that  information  must  travel  and,  more  importantly,  the 
‘hammering’  on  the  router.  In  the  case  of  finite  element  computations,  this  implies  that  adjacent 
elements  must  be  assigned,  as  much  as  possible,  to  directly  connected  processors,  and  contention 
for  the  wire  connecting  neighbouring  chips  must  be  reduced.  This  defines  the  mapping  prob¬ 
lem — that  is,  it  defines  which  hardware  processor  is  to  be  mapped  onto  which  finite  element  of 
a  given  mesh. 

Farhat'*  developed  a  heuristic  algorithm  for  mapping  massively  parallel  processors  onto  finite 
element  graphs  and  presented  some  analytical  results  for  corresponding  efficiency  improvement. 
Basically,  the  algorithm  searches  iteratively  for  a  better  mapping  candidate  through  a  two-step 
procedure  for  the  minimization  of  the  communication  costs  associated  with  a  specific  parallel 
processor  topology.  Because  it  seeks  a  very  fast  solution  for  a  machine  with  thousands  of 
processors,  this  algorithm  does  not  guarantee  ‘the’  optimal  mapping.  However,  it  has  prcHuced 
very  encouraging  results  on  a  variety  of  non-uniform  two  and  three-dimension,  meshes. 

In  this  work,  we  adapt  the  mapping  algonthm  of  Reference  16  to  our  target  parallel  processor, 
the  CM_2.  The  65  536  processors  of  this  machine  are  packaged  into  4096  16-processor  chips,  each 
having  its  own  router  node.  The  4096  router  nodes  are  arranged  in  a  hypercube  of  dimension  12. 
To  cope  with  this  topology,  we  proceed  in  two  steps.  First,  we  decomposes  the  given  mesh  into 
4096  submeshes,  each  containing  16  connected  finite  elements.  Next,  we  apply  the  mapper  given 
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in  Reference  16  to  identify  which  hardware  chip  is  to  be  mapped  onto  which  submesh.  Finally, 
within  each  submesh,  the  elements  are  numbered  randomly  between  the  chip  number  and  the 
chip  number  +  15. 

Given  a  finite  element  mesh,  there  are  several  ways  to  decompose  it  into  16-element  submeshes 
(see  for  example  Farhat*^  and  Malone'®).  Here,  each  submesh  is  to  be  assigned  to  one  chip  of  the 
CM_2.  In  Figure  9,  10  and  11,  we  show  two  different  decompositions  for  a  discretized  square 
domain,  Z),  and  D,. 

Both  decompositions  yield  16  submeshes,  each  with  16  adjacent  elements.  Decomposition  Z), 
was  designed  to  minimize  the  communication  bandwidth — that  is,  the  maximum  number  of 


Figure  9.  Domain  to  be  decomposed 


Figure  10.  Decomposition  0,— bandwidth  minimization 
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Figure  11.  Decomposition  D, — interface  minimization 


different  chips  with  which  any  chip  needs  to  communicate.  It  can  be  seen  (Figure  12)  that  for  Z), 
the  bandwidth  equals  2,  while  for  D,  it  equals  8. 

It  should  be  remarked  that,  if  the  substructuring  approach had  been  chosen— that  is 
assigning  a  subdomain  to  a  physical  processor,  D,  would  have  been  more  efficient  than  D,.  For 
this  decomposition,  each  chip  would  buffer  the  contributions  of  its  interface  nodes  and  send  only 
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Figure  12(a).  Interchip  communication  pattern  for  0, 
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Figure  12(b),  Interchip  communication  pattern  for  D, 


two  messages,  one  to  the  chip  at  its  left  and  another  to  the  chip  at  its  right.  The  decomposition  D, 
requires  the  same  chip  to  send  up  to  8  buffered  messages.  These  messages  would  eventually  be 
shorter,  but  would  still  render  more  expensive  because  of  message  start-up  costs.  However,  we 
have  opted  for  a  virtual  processor  approach — that  is  assigning  one  element  to  a  virtual  processor, 
for  reasons  that  are  given  in  Section  8.  For  this  case,  processors  e,xchange  information  one  node  at 
a  time,  so  that  the  number  of  interface  nodes  associated  with  a  decomposition  is  more  important 
than  its  bandwidth.  The  reader  can  confirm  that  decomposition  D,  delivers  255  interface  nodes, 
while  D,  delivers  only  93.  Indeed,  there  is  another  equally,  if  not  more  important,  reason  why  D, 
is  better  for  the  CM_2  than  D^ .  In  the  case  of  D, ,  all  of  the  16  processors  of  any  chip  communicate 
simultaneously  with  a  set  of  processors  which  are  on  the  same  neighbouring  chip  (Figure  12'.  This 
generates  a  significant  amount  of  contention  for  the  single  wire  that  connects  these  two  chips.  In 
the  case  of  however,  one  can  observe  (Figure  14)  that: 

•  for  each  chip,  only  12  out  of  the  16  processors  communicate  with  processors  onto  another 
chip 

•  only  3  processors  out  of  these  12  communicate  simultaneously  with  the  same  neighbouring 
chip,  so  that  much  less  contention  occurs  for  the  wire  connecting  the  two  chips.  Wc  recall 
that  each  chip  is  connected  with  up  to  12  other  ones  using  12  dificrent  wires  which  can 
operate  in  parallel. 

The  decomposition  D^  was  obtained  using  a  general  purpose  finite  clement  decomposer 
presented  by  the  first  author  in  Reference  17.  We  advocate  its  use  in  conjugation  with  the  mapper 
given  in  Reference  16  for  massively  parallel  computations  on  the  CM_2.  The  efficiency  improve¬ 
ment  potential  of  this  preprocessing  phase  is  demonstrated  with  the  following  finite  element  wave 
propagation  problem.  Plate  2  shows  the  discretization  of  a  tapered  cantilever  beam.  The  beam  is 


/ 
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Figure  14.  Wire  irallic  for  dccomposilion  0; 

modelled  with  4-node  isoparametric  elements  and  linearly  clstic  plane  stress  constitutive  equa¬ 
tions.  It  is  fixed  at  one  end  and  subjected  at  the  tip  of  the  other  to  an  impact  point  loading.  The 
wave  propagation  nature  of  the  problem  dictates  the  meshing  technique  to  create  elements  which 
are,  as  far  as  possible,  of  equal  size.  Since  the  beam  is  tapered,  transition  zones  with  irregular 
elements  had  to  be  introduced.  Other  mesh  irregularities  are  due  to  the  presence  of  a  region  with 
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a  hole.  The  complete  mesh  contains  8192  elements,  which  corresponds  to  an  8K  CM_2.  The  use 
of  a  naive  mapping  (element  i  into  processor  i  -  1)  would  have  resulted  in  a  ma,ximum  routing 
distance  between  adjacent  elements  equal  to  9.  Our  decomposer,  mapper  reduces  this  distance  to 
5.  If  EFF  denotes  the  efficiency  (speed-up  per  processor)  of  the  parallel  computations  using 
a  naive  mapping,  and  /  is  the  factor  by  which  the  decomposer/mapper  reduces  the  ma,ximum 
routing  distance  between  adjacent  elements,  the  theoretical  improved  efficiency**  is  given  by 


For  this  problem,  we  have  measured  an  efficiency  EFF  =  40  per  cent  on  an  8K  CM_2.  Since 
/=  9/5,  the  predicted  improved  efficiency  is  EFF*  =  54  per  cent.  A  second  run  of  the  problem 
using  the  decomposer/mapper  has  revealed  a  measured  improved  efficiency  EFF*  =  60  per  cent. 
The  discrepancy  between  the  predicted  and  measured  improved  efficiencies  is  due  to  the  fact  that 
(5)  does  not  account  for  the  wire  contention  problem. 


6.  FLOWCHART  OF  THE  MASSIVELY  PARALLEL  TRANSIENT  SIMULATION 


The  overall  organization  of  the  solution  on  the  CM_2  of  a  transient  dynamic  problem  using  the 
explicit  central  difference  algorithm  is  depicted  in  Figure  15.  It  consists  of  four  phases,  namely: 
mesh  preprocessing,  data  loading,  number  crunching  and  data  unloading. 

A  conservative  stable  time  step  for  the  central  difference  algorithm  is  given  by 


h 


(6) 


where  wJSi,  is  the  maximum  clement  frequency  of  the  undamped  dynamic  problem.  Belytschko 
has  pointed  out  that  it  is  in  fact  usually  not  practical  to  compute  the  maximum  eigenvalues  of  the 
element  directly,  for  this  would  increase  the  cost  of  computation  considerably.*'*  Instead, 
formulas  for  upper  bounds  on  have  been  recommended.  However,  on  massively  parallel 
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Compute  Velocities,  Displacements,  Strains  and  Stresses  (CM.2) 

Visualize  Results  (CM.2  -  FVnme  Duffer) 

Archive  Results  (CM.2  -  Data  Vault) 

1 

Figure  15.  Solution  of  a  transient  problem  on  the  C.M.2 
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Figure  16.  Interprocessor  communication  for  a  hybrid  patch 


processors  such  as  the  CM_2,  the  paraileiism  inherent  in  the  computation  of  is  such  that  this 
frequency  is  obtained  at  the  cost  of  the  frequency  of  one  single  element. 

The  interprocessor  communication  mechanism  for  a  mesh  with  more  than  one  type  of  element 
is  illustrated  in  Figure  16.  For  the  e.xample  shown,  the  4-node  elements  arc  activated  first.  They 
communicate  in  four  steps,  one  node  at  a  time.  Nc.xt.  the  4-node  clcmcats  are  dc-activated  and  the 
truss  elements  are  selected.  These  communicate  in  two  steps.  As  c.xpiained  in  Section  2.2.  the 
serialization  between  different  types  of  elements  is  due  to  the  SIMD  nature  of  the  CM_2. 

7.  EXAMPLES 

In  this  section,  we  apply  our  approach  to  massively  parallel  finite  element  explicit  computations 
to  the  solution  of  various  transient  problems  on  an  8K  CM_2  with  Weitek  accelerators.  We 
analyse  performance  results  in  detail.  We  assess  the  efficiency  of  our  decomposition,  mapping 
strategy  at  reducing  communication  time.  We  highlight  the  impact  on  machine  performance  of 
variations  in  mesh  topology,  finite  element  modelling  and  problem  non-Iineantics.  We  also  report 
on  the  performance  of  the  Data  Vault  system  for  problems  that  arc  I/O  bound. 

For  each  example,  two  simulations  were  carried  out.  The  first  one  assumed  a  linear  elastic 
material.  In  the  second  simulation,  the  material  was  assumed  to  have  an  elastoplastic  behaviour 
governed  by  a  von  Miscs  yield  condition. 
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7.1.  El:  Transient  response  of  a  cracked  aluminium  plate 

The  quarter  of  a  mesh  in  Figure  17  was  generated  to  study  the  dynamic  response  of  a  cracked 
aluminium  plate  under  a  uniform  time  varying  loading.  The  full  mesh  contained  a  total  of  4008 
plane  stress  elements  and  4073  nodes.  Mesh  irregularities  were  induced  by  transition  zones.  The 
NEIVS  grid  could  not  be  used. 

7.2.  E2:  Wace  propagation  in  a  three-dimensional  bar 

The  second  e.xample  considered  was  the  impact  of  a  metallic  bail  on  an  unsupponed  glassy  bar. 
The  bar  was  discretized  using  8160  brick  elements  (Figure  18).  The  finite  clement  mesh  contained 
13  500  nodes  and  40  500  degrees  of  freedom.  Given  the  regularity  of  the  discretization,  the  SEWS 
grid  was  used  for  interprocessor  communication.  This  c.xampic  was  also  re-run  using  the  router 
for  performance  comparison. 


7.3.  E3:  Shuttle  docking  induced  vibrations  in  a  space  station 

This  dynamic  analysis  was  carried  out  to  investigate  the  vibrations  of  a  space  station  model 
assembled  from  5-m  crcctablc  struts.  These  vibrations  were  assumed  to  be  induced  by  a  shuttle 
docking.  The  finite  clement  model  (Figure  19)  comprised  7584  three-dimensional  truss  elements 
and  2304  nodes.  It  was  generated  by  aligning  identical  cells  along  various  a.xcs.  However,  each  cell 
by  itself  was  irregular  and  did  not  allow  the  use  of  the  NEWS  grid. 


7.4.  E4:  Three-dimensional  glassy  bar  on  an  elastic  foundation 

The  wave  propagation  e.xampie  problem  E2  was  repeated  with  different  boundary  conditions. 
The  glassy  bar  was  assumed  to  be  supported  by  a  layer  of  foam.  The  mesh  was  comprised  of 


Figure  IT.  A  quarter  of  3  mah  for  a  cnclcctl  plate 
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a  total  of  8164  elements  (which  is  very  close  to  the  number  of  elements  in  the  former  mesh),  of 
which  1636  truss  elements  were  used  to  model  an  clastic  foundation. 

7.5.  Performance  results  and  analysis 

The  large  majority  the  code  segments  was  written  in  C*.  Occasionally  we  have  used  Paris 
functions  to  speed  up  some  manipulations.  Floating-point  arithmetic  was  performed  in  single 
precision  (32  bit  words).  Measured  performance  results  are  gathered  in  Tables  II.  Ill,  IV,  V  and 
VI.  The  reported  Mflops  rates  account  for  every  integer  and  floating  point  operation,  whether 
used  for  addressing  or  number  crunching.  Only  example  E2  could  make  use  of  the  NEIVS  grid. 
However,  all  timings,  except  those  given  in  Table  VI,  correspond  to  runs  where  communication 
was  carried  through  the  router.  Execution  times  are  given  in  seconds  and  correspond  to  a  sample 
of  2000  time  integration  steps  and  a  vp  ratio  equal  to  1. 


Table  II.  Overall  measured  performance  for  various  transient  finite  element  computations 

Exan>i. 

Mesh 

pre-processing 

Data  loading 
in  the  CM-2 

Equation  of  motion 
solying 

Sustained 

Mflops 

El — eiaJi 

1'04  sec 

5-47  sec 

861  sec 

400 

El — elastoplastic 

104  sec 

5-47  sec 

1033  sec 

480 

E2 — elastic 

1'98  sec 

31-78  sec 

4139  sec 

392 

E2 — elastoplastic 

1-98  sec 

31-78  sec 

4718  sec 

440 

E3— elastic 

1-28  .sec 

13-56  sec 

887  sec 

254 

E3 — elastoplastic 

1'28  sec 

13-56  sec 

896  sec 

256 

— elastic 

2' 11  sec 

33-00  sec 

4770  sec 

340 

E4 — ^'j’stoplastic 

2-11  sec 

33-00  sec 

5440  sec 

386 

Table  ill.  Data  Vault  system  performance 


Exam'.!'. 

Solving  equation 
of  motion 

Unloading  results 
on  front  end 

Unloading  results 
on  Data  Vault 

El 

861  sec 

5340  sec 

3-8 1  sec 

E2 

4139  sec 

16400  sec 

12'6l  sec 

E3 

887  sec 

9500  sec 

7'04  sec 

Table  IV.  Computation  vs.  communication 


E.xample 

Solving  equation 
of  motion 

Computation 

time 

Communication 

time 

El 

861  sec 

460  sec 

401  sec 

E2 

4139  sec 

1959  sec 

2180  sec 

£3 

887  sec 

260  sec 

627  sec 

E4 

4770  SI 

2340  sec 

2430  sec 
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Table  V.  True  communication  time 


Example 

Computation 

time 

Effective 

communication  time 

Software 

overhead 

El 

460  sec 

81  sec 

320  sec 

E2 

1959  sec 

1380  sec 

1280  sec 

E3 

260  sec 

146  sec 

481  sec 

Table  VI.  Router  vs.  NEWS  grid 


Example 

Computation 

time 

Communication  time 
using  the  NEWS  grid 

Communication  time 
using  the  router 

E2 

4139  sec 

560  sec 

2660  sec 

The  mesh  pre-processing  phase  corresponds  to  the  decomposition  of  the  finite  element  mesh,  as 
explained  in  Section  5.  It  also  includes  the  setup  of  the  finite  element  parallel  data  structure,  which 
is  then  distributed  across  the  processors.  Both  of  these  phases  are  shown  to  require  relatively  very 
little  computer  time.  It  can  also  be  observed  that,  in  the  worst  case,  the  non-linear  computations 
consume  only  about  15  per  cent  additional  time.  This  is  due  to  the  explicit  nature  of  the  radial 
return  mapping  algorithm  that  was  used.  Because  of ‘what  you  see  is  what  you  get’,  the  reported 
Mflop  rates  should  be  compared  to  those  measured  in  Section  3  and  not  to  the  theoretical  peak 
performance  of  the  machine.  It  should  also  be  noted  that  our  C*  code  still  leaves  room  for  further 
op'imizations. 

For  examples  El,  E2.  and  E3,  the  computed  displacements,  strains  and  stresses  were  archived 
on  secondary  storage  after  each  time  integration  step.  Two  solutions  were  compared.  In  the  first 
case,  these  results  were  brought  back  to  the  front  end  and  stored  in  appropriate  disk  files.  For  that 
case,  the  measurements  given  in  Table  III  demonstrate  that  the  amount  of  involved  I/O 
dominated  the  simulation  total  time.  In  the  second  case,  the  results  were  transferred  in  parallel 
directly  to  a  Data  Vault  system.  The  speed-up  provided  by  the  Data  Vault  is  shown  to  be  of  the 
order  of  1400!  This  parallel  I/O  capability  is  what  was  most  lacking  on  earlier  hypercubes.’® 

If  Tep  and  are  respectively  the  computation  parallel  time  and  the  communication  parallel 
time,  and  is  the  number  of  available  processors  on  a  given  parallel  machine,  the  achieved 
efficiency  (speed-up  per  processor)  can  be  expressed  as 


1  ’V,r,p 

Np  Tep  +  T^rn 


1 


The  results  given  in  Table  IV  indicate  that  efficiencies  of  53,  47,  29  and  49  per  cent  are  achieved 
respectively  for  examples  El.  E2.  E3  and  E4.  If  one  refers  to  the  performance  results  of  Section  3. 
it  can  be  seen  that  the  sustained  Mflop  rates  reported  in  Table  II  are  consistent  with  these 
efficiencies.  At  the  first  glance,  these  efficiency  results  appear  to  be  very  pessimistic.  However,  tliey 
are  well  above  the  10  per  cent  often  obtained  on  current  vector  supercomputers.’®  The  reader  can 
observe  that  the  timing  results  for  example  E4  are  very  close  to  the  cumulative  timings  of 
examples  E2  and  E3.  which  illustrates  the  impact  of  the  SIMD  nature  of  the  CM_2  on  the  MIMD 
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nature  of  finite  element  computations.  It  should  also  be  noted  that,  while  th.  rommunication  time 
is  fixed  for  a  given  mesh,  the  computation  time  increases  with  the  complexity  of  the  analysis. 
Thus,  highly  non-linear  formulations  which  include  large  deformations  are  expected  to  yield 
higher  efficiencies  than  those  deduced  from  Table  IV. 

At  this  point,  we  give  further  details  regarding  interprocessor  communication  in  the  context  of 
finite  element  explicit  computations.  As  outlined  in  Section  5,  the  finite  elements  of  a  mesh 
exchange  their  local  contributions  one  node  at  a  lime.  For  a  given  finite  element,  this  information 
exchange  procedure  is  organized  around  two  nested  loops.  The  outer  loop  is  carried  over  the 
nodes  that  are  connected  to  this  element.  The  inner  loop  is  carried  over  the  neighbouring 
elements  that  are  attached  to  each  local  node.  Using  a  C  notation,  this  is  written  as: 

for  (node  =  I:  node  ^  my^nodes;  node+  )  (7) 

{ 

start  =  pointer  [node];  stop  =  pointer  [node  -1-  \]  —  1; 
for  ( position  =  start;  position  ^  stop;  position  +  +  7  (8) 

{ 

neighbor  =  proc-att-to-node  [ position  j; 
exchange  (variable,  myself,  neighbor): 

} 

} 

where  my^nodes  is  the  total  number  of  nodes  that  are  connected  to  a  given  finite  element  and 
proc-att-to-node  is  the  array  containing  the  identification  of  the  neighbouring  elements.  Clearly, 
these  variables  are  element  dependent.  The  total  number  of  communications  to  be  performed  by 
one  processor  is  determined  by  the  product  =  d*{pointerlmy~nodes  +  1]  —  1)  which  is  both 
element  and  mesh  dependent.  The  CM_2  being  an  SIMD  machine,  the  communication  time  is 
determined  by  max<,[PS.m].  For  a  regular  mesh  composed  of  three-dimensional  truss  elements 
(d  =  3)  or  4-node  plane  elements  [d  =  2),  every  node  is  attached  to  4  elements,  so  that  24 
communication  instructions  per  time  integration  step  are  required  for  the  truss  element  and  32  for 
the  4-node  plane  element.  However,  Table  IV  indicates  that  the  space  station  example  exhibits 
a  longer  communication  time  than  does  the  aluminium  plate  problem.  The  reason  is  that  in  the 
mesh  of  example  E3,  some  truss  elements  are  connected  to  12  other  elements.  Because  of  the 
SIMD  nature  of  the  CM_2.  the  element  with  the  highest  degree  of  connectivity  determines  the 
communication  time.  For  a  regular  mesh  with  8-node  solid  elements  (J  =  3)  each  time  integration 
step  is  followed  by  1 92  communication  steps,  since  each  node  can  be  attached  up  to  eight  different 
elements.  This  is  reflected  in  Table  IV  where  example  E2  is  shown  to  possess  by  far  the  longest 
communication  time  (2180  sec).  In  summary,  ;hc  amount  of  communication  involved  in  finite 
element  explicit  computations  on  the  CM_2  is  determined  by  the  element  topology  and  order, 
and  the  mesh  irregularities.  Because  only  <i  nodal  information  is  exchanged  at  a  time  among  the 
CM_2  processors,  three-dimensional  and  high  order  elements  substantially  increase  the  com¬ 
munication  time.  Mesh  irregularities  also  adversely  affect  the  amount  of  communication  bec.ause 
of  the  SIMD  nature  of  the  CM_2.  It  is  interesting  to  note  that  elements  which  transmit  physical 
information  across  edges  and  faces  such  as  those  proposed  by  De  Veubeke,’*  would  require  much 
less  communication  than  traditional  elements.  The',e  elements  should  be  '•evisited  for  computa¬ 
tions  on  massively  parallel  processors  such  as  the  CM._2. 
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An  in-depth  investigation  of  the  communication  phase  was  earned  out.  It  was  found  that  most 
of  the  communication  time  was  elapsed  in  the  header  of  loop  (8).  This  loop  header  involves  the 
quantities  start  and  stop  which  differ  from  one  processor  to  another  in  the  presence  of  mesh 
irregularities  and  different  element  types.  Consequently,  the  front-end  computer  has  to  process 
and  manage  several  different  loops  rather  than  a  unique  one,  which  is  not  very  efficient  on  an 
SIMD  machine.  The  time  associated  with  the  headers  of  loops  (7)  and  (8)  is  referred  to  as  software 
overhead  in  Table  V.  The  true  time  that  is  elapsed  in  effective  communication  among  the 
processors  is  shown  to  be  only  a  fraction  of  the  overall  communication  time  (see  Table  V). 

Because  it  was  designed  to  handle  arbitrary  meshes,  our  C*  code  did  not  make  use  of  the 
NEWS  grid  package.  Hosvever,  a  special  module  that  incorporated  calls  to  the  NEWS  grid  was 
written  specifically  for  the  regular  mesh  of  example  E2.  Execution  times  for  this  example  using 
both  the  NEWS  grid  and  the  router  are  shown  in  Table  VI.  Clearly,  a  high  price  is  paid  for  the 
handling  of  eventual  mesh  irregularities. 

However,  the  irregular  pattern  of  communication  is  fixed  in  time.  Thus,  a  considerable 
improvement  can  be  achieved  if  this  pattern  is  evaluated  at  the  first  time  step,  then  somehow 
stored  in  the  CM_2  for  use  during  subsequent  time  steps.  We  believe  that  this  is  an  issue  that 
massively  parallel  computer  architects  should  investigate. 


2000  Time  Integration  Steps 


SECONDS 


Figure  20.  The  tlccomposcr/mapper  performance 
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In  order  to  assess  the  performance  of  the  decomposer/mapper  module,  examples,  El,  E2  and 
E3  were  re-run  with  the  naive  shifted  identity  mapping  (element  /  in  processor  i  -  1 ).  Figure  20 
demonstrates  that  the  true  communication  time  cun  be  reduced  by  as  much  as  60  per  cent. 
Unfortunately,  the  total  execution  time  is  reduced  only  between  10  and  17  per  cent  because  of  the 
communication  software  overhead  associated  with  mesh  irregularities. 


8.  CONCLUDING  REMARKS 

We  have  reported  herein  on  our  experience  in  performing  transient  finite  element  computations 
on  the  CM_2.  We  have  presented  the  architectural  features  of  this  parallel  processor  and 
discussed  their  impact  on  finite  element  computational  strategies.  In  particular,  those  features 
which  distinguish  the  CM_2  from  earlier  hypercubes  have  been  emphasized.  These  include  the 
virtual  processor  concept  and  the  fast  parallel  I/O  capabilities.  The  processor  memory  memory 
size  of  64  Kbits  has  been  shown  to  penalize  high  order  elements.  We  have  also  described  and 
discussed  a  domain  decomposition  strategy  and  a  mapping  algorithm  which  are  suitable  for 
massively  parallel  processors  such  as  the  Connection  Machine.  The  main  idea  behind  the 
decomposition  technique  is  the  minimization  of  both  the  amount  of  wire  contention  within 
a  chip,  and  the  amount  of  communications  between  different  chips.  A  given  finite  element  mesh  is 
partitioned  into  16-element  subdomains  which  correspond  to  the  16-processor  chips  of  the 
Connection  Machine.  This  partitioning  is  carried  out  in  a  way  that  minimizes  the  number  of 
nodes  at  the  interface  between  the  subdomains.  As  a  result,  only  those  processors  which  arc 
mapped  onto  finite  elements  at  the  periphery  of  a  subdomain  communicate  with  processors 
packaged  on  different  chips.  Moreover,  this  partitioning  is  such  that  the  connectivity  bandsvidth 
of  the  resulting  subdomains  is  large  enough  to  allow  an  efficient  use  of  the  interchip  wires.  The 
mapping  algorithm  attempts  to  reduce  the  distance  information  has  to  travel  through  communi¬ 
cation  network.  In  essence,  the  algorithm  searches  iteratively  for  an  optimal  mapping  through 
a  two-step  minimization  of  the  communication  costs  associated  with  a  candidate  mapping. 
Various  issues  related  to  the  single  instruction  multiple  data  stream  nature  of  the  CM_2  and 
pertinent  to  computational  mechanics  have  been  addressed.  Measured  performance  results  for 
realistic  two-  and  three-dimensional  transient  problems  have  been  reported.  Three-dimensional 
and  high  order  elements  have  been  shown  to  induce  longer  communication  times.  Mesh 
irregularities  have  been  shown  to  slow  down  the  computation  speed  in  many  ways.  The  Data 
Vault  has  been  demonstrated  to  be  very  effective  at  reducing  the  I/O  time. 

Now,  we  briefly  highlight  some  additional  implementational  and  theoretical  issues  that  we 
hope  will  materially  advance  the  application  ranges  of  finite  element  computations  on  this  highly 
parallel  processor. 

8.1.  Virtual  processor  ratio  rs.  .substructuring 

In  this  work,  we  have  assigned  when  possible  more  than  one  finite  element  to  a  single  processor 
using  the  virtual  processor  feature  of  the  CM_2.  However,  another  way  to  obtain  the  same  result 
is  to  assign  a  substructure  to  an  individual  processor. ‘ From  a  numerical  point  of  view,  both 
approaches  are  equivalent.  However,  thcjc  two  distinct  approaches  differ  in  their  implementa¬ 
tions  and  may  perform  differently.  The  substructure  approach  requires  each  processor  to  work 
with  both  external  and  internal  data  structures.  The  set  of  external  data  structures  stores 
information  about  substructure  interconnections.  These  are  similar  lo  the  ones  described  in  this 
paper.  The  set  of  internal  data  structures  stores  the  connectivity  table  of  the  elements  within 
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a  substructure.  The  computations  within  each  substructure  are  carried  out  by  looping  over  the 
elements  of  that  substructure.  The  advantage  of  this  approach  is  a  saving  in  storage  since  the 
substructure  internal  nodes  are  uniquely  defined,  and  a  faster  computation  of  the  results 
associated  with  these  nodes.  Moreover,  the  global  results  at  the  internal  nodes  can  be  accumu¬ 
lated  without  any  explicit  call  to  a  message-passing  function.  The  global  quantities  at  the 
boundary  nodes  are  accumulated  using  the  router  and  the  external  data  structures.  However,  the 
substructuring  approach  requires  that  the  sequencer  broadcast  the  same  instruction  several 
times,  once  for  each  element  of  the  substructure,  which  increases  the  overall  wall  clock  execution 
time.  Moreover,  this  approach  does  not  allow  the  Weitek  chip  to  pipeline  the  computations  over 
the  elements  of  the  substructure. 

On  the  other  hand,  the  virtual  processor  approach  requires  that  each  element  communicate 
explicitly  with  its  neighbours,  even  if  these  are  assigned  to  the  same  processor.  Of  course,  this 
communication  is  virtual  since  it  is  within  the  processor  itself  and  generates  minimal  additional 
overhead.  On  the  positive  side,  the  virtual  processor  approach  utilizes  only  one  type  of  data 
structure  and  exploits  the  pipelining  capabilitieis  of  the  Weitek  chip.  The  latter  feature  signific¬ 
antly  enhances  overal  performance,  as  demonstrated  in  Section  3.  Consequently,  we  advocate  the 
use  of  the  virtual  processor  ratio  rather  than  the  substructuring  technique,  especially  if  the 
processor  memory  size  is  to  be  increased  in  the  future. 

8.2.  Implicit  algorithms  and  the  CM~2 

In  this  report,  no  attempt  has  been  made  to  design  a  novel  parallel  algorithm  for  the  solution  of 
the  differential  equation  of  motion.  We  have  selected  the  central  difference  algorithm  because  of 
its  inherent  parallelism,  which  allowed  us  to  focus  on  implementational  issues  and  to  fully  explore 
the  multiprocessing  capabilities  of  the  CM_2.  Our  experience  suggests  that  a  whole  class  of 
explicit  and  semi-implicit  dynamic  and  static  algorithms  can  be  implemented  on  the  CM_2  in 
a  very  similar  way.  Among  others,  we  cite  the  EBE  algorithms,*-  the  EBE  preconditioncrs’^  and 
the  Jacobi  preconditioned  conjugate  gradient  algorithm.-'^  However,  the  solution  of  some  static 
and  transient  problems  may  necessitate  the  use  of  an  implicit  algorithm,  which  usually  implies  the 
solution  of  a  set  of  simultaneous  banded  equations.  If  the  global  symmetric  stiffness  matrix  K  is 
banded,  with  semi-bandwidth  b,  then  it  is  well  known  (see  for  example  Ortega  and  Voigt**)  that 
Gaussian  elimination  methods  for  solving  Kd  =  F  allow  at  each  step  on  the  order  of  h~,  2  pairs  of 
(  +  ,  .x)  to  be  processed  concurrently,  but  require  significant  communication  because  the  h  entries 
of  the  pivot  column  must  be  made  available  to  all  other  processors.  Several  parallel  algorithms 
based  on  these  elimination  methods  were  designed  for  finite  element  applications  and  were 
implementated  on  earlier  hypercubes  (see  for  example,  Farhat  and  Wilson**  and  Utku  et  al.-~). 
Typically,  a  processor  'vas  assigned  to  a  set  of  matrix  columns.  Results  from  our  previous 
experience  with  the  early  version  of  Intel’s  iPSC  suggests  that  direct  solvers  are  feasible  on 
hypercubes  only  when  the  number  of  available  processors,  N^,  is  much  smaller  than  the 
bandwidth  h  of  the  given  finite  element  problem,  .so  that  communications  do  not  dominate 
computations.  On  the  iPSC-1,  a  message  that  was  sent  from  one  extreme  corner  of  a  5-di¬ 
mensional  cube  to  the  other  would  result  in  an  elapsed  time  475  times  longer  than  the  time  to 
perform  a  floating  point  multiplication  (see  Rudell,*®).  However,  on  a  10-dimensional  subcubc  of 
the  CM_2  we  have  measured  the  ratio  of  a  broadcast  to  a  floating  point  computation  to  be  only 
about  2.87,  This  observation  suggests  that,  for  problems  with  h  >  360.  a  processor  could  be 
mapped  onto  a  few  matrix  entries  and  a  parallel  direct  solver  could  be  feasible  on  the  CM_2.  For 
problems  with  smaller  bandwidth,  direct  solvers  which  operate  on  more  than  one  pivot  at 
a  time*’  *®  should  also  be  investigated  lor  implementation  on  massively  parallel  p  ocessors. 
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There  is  an  additional  issue  which  has  to  be  examined  before  attempting  to  solve  finite  element 
equations  on  the  CM_2  with  a  parallel  direct  solver.  This  issue  is  related  to  the  balance  on 
massively  parallel  processors  between  the  number  of  available  processors,  and  the  processor 
memory  size.  Let  M"  denote  a  two-dimensional  regular  n  by  it  finite  element  mesh,  where  n  is  the 
number  of  elements  along  one  side.  If  d  is  the  number  of  degrees  of  freedom  at  a  given  node,  the 
semi-bandwidth  of  M"  \s  b  =  d{n  -f  3)  and  the  total  number  of  mathematical  unknowns  is 
N  =  d[n  +  1)’.  For  this  mesh,  the  storage  cost  of  K  amounts  to  Nb  =  d-(n  -r  3){n  +  1)^  words. 
The  total  amount  of  storage  available  on  the  CM_2  is  S  =  N^*  m^,  where  is  the  number  of 
available  processors  and  =  8  Kbytes  is  the  curren»  size  of  the  processor  memory.  Let 
iV£  =  be  the  maximum  number  of  elements  for  which  M"  has  a  banded  stiffness  matrix  that 
can  be  factored  in-core  on  the  CM_2.  Table  VII  gives  the  values  of  NE  for  different  values  of 
d  and  for  the  case  of  a  fully  configured  Connection  Machine  (iVj,=  65  536).  Values  of  NE  are 
shown  for  both  single  precision  (32  bit  words)  and  double  precision  (64  bit  words)  floating  point 
arithmetic. 

Clearly,  e.xcept  for  the  case  where  d  =  2  and  floating  point  arithmetic  is  done  in  single  precision. 
NE  is  smaller  than  N^,.  Similarly,  the  case  where  .Vf "  is  an  n  by  n  three-dimensional  regular  mesh  is 
assessed  in  Table  VIII  for  various  values  of  d. 

For  this  case,  NE  is  much  smaller  than  N^,,  even  for  d  =  2  and  for  single  precision  floating  point 
arithmetic.  For  d  =  6  (some  shell  elements),  only  8000  elements  (4000  elements)  can  be  included  in 
M"  when  computations  are  carried  out  using  single  precision  (double  precision)  floating  point 
arithmetic. 

It  is  noted  that  the  eventual  solution  u.  ^  o/siem  of  equations  is  only  one  phase  of  several  finite 
element  computational  sequences.  In  linear  three-dimensional  analysis,  this  phase  dominates  the 
computer  execution  time.  However,  in  the  non-linear  analysis  of  flexible  space  structures  most  of 
the  computational  time  is  usually  spent  in  modules  that  perform  element  level  computations.^* 
These  include  the  evaluation  of  generalized  nodal  internal  forces  and/or  elemental  stiffness 
matrices.  Consider  now  a  mesh  M"  where  the  number  of  elements  NE  is  chosen  so  that  the  upper 
part  of  the  banded  stiffness  matrix  K  fills  the  N^  processor  memories  completely.  The  preceding 
complexity  analysis  demonstrates  that  the  balance  on  the  CM_2  between  the  number  of 
processors  and  the  memo’^v  ’  e  of  each  processor  is  such  that  NE  is  much  smaller  than  N^,. 
Hence,  if  a  direct  algorithm  i-  used  to  solve  a  finite  element  system  of  equations,  the  N^,  processors 
will  be  active  during  the  solution  phase,  but  .V^  —  E  processors  will  remain  id(e  during  the  rest  of 


Table  VII.  Number  of  allowable  elements  vs.  DOF/node  for  the  two-dimensional  case 


:V^=  65536 

d  =  l 

II 

d  =  4 

II 

d  =  6 

Single  precision 

NE 

102400 

59536 

40401 

29  929 

23409 

Double  precision 

NE 

64009 

37  249 

25281 

18769 

14884 

Table  VIII.  Number  of  allowable  elements  vs.  DOF  ’node  for  the  three-dimensiona'  'isc 

;V^=  65536 

r  1 

II 

II 

(i  =  4 

d  -  5 

11 

Single  precision 

NE 

29791 

19683 

13824 

10648 

8000 

Double  precision 

NE 

19683 

12167 

9261 

6859 

4913 
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the  phases  which  involve  element  level  computations.  Consequently,  an  in-core  direct  solution 
strategy  would  not  efficiently  utilize  the  computational  power  of  the  CM_2  in  a  highly  non-linear 
Unite  element  analysis. 
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must  also  store  in  its  memory  a  set  of  scalars  corresponding  to  compnlaliona)  which  are  common  to  adjacent  elements.  This  is  necessary  to  gnar- 

parameters  such  as  the  fixed  time  step  h,  and  a  scalar  or  ono-dimenr'onal  antce  that,  for  example,  a  moment  is  not  accumnlated  with  a  force, 

hnnbr  for  the  temporary  storage  of  messages  to  be  passed  to  neighboring  o*'  •'I'®*’  **  force  in  tho  x  direction,  is  not  accumulated  with  a  force  in 

processors.  V  direction. 


the  set  of  adjacent  elements  (processors) 

the  node(s)  it  shares  with  each  adjacent  element  (processor)  '  '  Typical  finite  element  mesh 
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Finite  Element  Simulations  on  CM  299 


Measured  Performance 

At  this  writing,  C*  does  not  support  some  of  the  floating  point  reduction 
FB  discretization  of  a  tapered  beam  -  capabilities  and  does  not  exploit  the  hardware  features  for  indirect  address- 
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